A house value is simply more than location and square footage. Like the features that make up a person, an educated party would want to know all aspects that give a house its value. For example, you want to sell a house and you don’t know the price which you may expect — it can’t be too low or too high. To find house price you usually try to find similar properties in your neighborhood and based on gathered data you will try to assess your house price.
The objective of this problem statement is to predict the housing prices of a town, or a suburb, based on the features of the locality provided and identify the most important features to consider while predicting the prices.
This dataset has 23 features, price being the target variable. The details of all the features are given below:
#Importing libraries
import pandas as pd
import os
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
sns.set(color_codes =True )
import scipy as sp
from scipy.stats import chi2_contingency
import warnings
warnings.filterwarnings("ignore")
from sklearn.impute import KNNImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm
import statsmodels.stats.api as sms
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
#Removes the limit from the number of displayed columns and rows.
#This is so I can see the entire dataframe when I print it
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 200)
#Importing dataset
os.chdir(r'C:\Users\dhuds\Downloads')
data = pd.read_excel('innercity.xlsx')
df=data.copy()
#Dimensions of df
print(f'The dataset has {df.shape[0]} rows and {df.shape[1]} columns.')
The dataset has 21613 rows and 23 columns.
#Observing a randoms sample of 10 observations
np.random.seed(1)
df.sample(n=10)
| cid | dayhours | price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | zipcode | lat | long | living_measure15 | lot_measure15 | furnished | total_area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15544 | 8718500275.000 | 20140715T000000 | 390000.000 | 3.000 | 2.750 | 1950.000 | 12240.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1250.000 | 700.000 | 1956.000 | 0.000 | 98028.000 | 47.740 | -122.258 | 1880.000 | 12000.000 | 0.000 | 14190.000 |
| 17454 | 9485920120.000 | 20140829T000000 | 290000.000 | 4.000 | 2.500 | 2340.000 | 52272.000 | 2.000 | 0.000 | 0.000 | 2.000 | 8.000 | 2340.000 | 0.000 | 1978.000 | 0.000 | 98042.000 | 47.347 | -122.091 | 2480.000 | 40500.000 | 0.000 | 54612.000 |
| 21548 | 1310430400.000 | 20140513T000000 | 455000.000 | 4.000 | 2.500 | 3360.000 | 7685.000 | 2.000 | 0.000 | 0.000 | 3.000 | 9.000 | 3360.000 | 0.000 | 2001.000 | 0.000 | 98058.000 | 47.437 | -122.111 | 3060.000 | 6567.000 | 1.000 | 11045.000 |
| 3427 | 4109600195.000 | 20140718T000000 | 524000.000 | 4.000 | 2.750 | 2310.000 | 5000.000 | 1.500 | 0.000 | 0.000 | 5.000 | 8.000 | 1480.000 | 830.000 | 1908.000 | 0.000 | 98118.000 | 47.550 | -122.268 | 1100.000 | 5000.000 | 0.000 | 7310.000 |
| 8809 | 7282300095.000 | 20140709T000000 | 295000.000 | 2.000 | 1.000 | 800.000 | 6500.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 800.000 | 0.000 | 1953.000 | 0.000 | 98133.000 | 47.762 | -122.358 | 1220.000 | 7000.000 | 0.000 | 7300.000 |
| 3294 | 475000510.000 | 20141118T000000 | 594000.000 | 3.000 | 1.000 | 1320.000 | 5000.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 1090.000 | 230.000 | 1920.000 | 0.000 | 98107.000 | 47.667 | -122.365 | 1700.000 | 5000.000 | 0.000 | 6320.000 |
| 275 | 4046710180.000 | 20150325T000000 | 660000.000 | 3.000 | 3.500 | 3600.000 | 37982.000 | 2.000 | 0.000 | 0.000 | 4.000 | 8.000 | 3600.000 | 0.000 | 1996.000 | 0.000 | 98014.000 | 47.698 | -121.917 | 2050.000 | 18019.000 | 0.000 | 41582.000 |
| 8736 | 538000030.000 | 20140730T000000 | 272500.000 | 3.000 | 2.000 | 1540.000 | 6250.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1540.000 | 0.000 | 1998.000 | 0.000 | 98038.000 | 47.354 | -122.025 | 2070.000 | 6250.000 | 0.000 | 7790.000 |
| 6161 | 2724200705.000 | 20141212T000000 | 95000.000 | 2.000 | 1.000 | 800.000 | 8550.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 800.000 | 0.000 | 1947.000 | 0.000 | 98198.000 | 47.407 | -122.294 | 1490.000 | 8550.000 | 0.000 | 9350.000 |
| 19832 | 7771300085.000 | 20150309T000000 | 411500.000 | 3.000 | 1.000 | 1130.000 | 8159.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 1130.000 | 0.000 | 1954.000 | 0.000 | 98133.000 | 47.736 | -122.333 | 1570.000 | 8162.000 | 0.000 | 9289.000 |
df.drop(columns = ['cid'], inplace = True)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 21613 entries, 0 to 21612 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 dayhours 21613 non-null object 1 price 21613 non-null float64 2 room_bed 21505 non-null float64 3 room_bath 21505 non-null float64 4 living_measure 21596 non-null float64 5 lot_measure 21571 non-null float64 6 ceil 21571 non-null object 7 coast 21612 non-null object 8 sight 21556 non-null float64 9 condition 21556 non-null object 10 quality 21612 non-null float64 11 ceil_measure 21612 non-null float64 12 basement 21612 non-null float64 13 yr_built 21612 non-null object 14 yr_renovated 21613 non-null float64 15 zipcode 21613 non-null float64 16 lat 21613 non-null float64 17 long 21613 non-null object 18 living_measure15 21447 non-null float64 19 lot_measure15 21584 non-null float64 20 furnished 21584 non-null float64 21 total_area 21584 non-null object dtypes: float64(15), object(7) memory usage: 3.6+ MB
print(df[df.duplicated()].count()) #Reporting the number of duplicates
df[df.duplicated()]
df.drop_duplicates(inplace = True)
dayhours 0 price 0 room_bed 0 room_bath 0 living_measure 0 lot_measure 0 ceil 0 coast 0 sight 0 condition 0 quality 0 ceil_measure 0 basement 0 yr_built 0 yr_renovated 0 zipcode 0 lat 0 long 0 living_measure15 0 lot_measure15 0 furnished 0 total_area 0 dtype: int64
# Unique values
columns = df.columns
for col in columns:
print('Unique Values of {} are \n'.format(col),df[col].unique())
print('*'*100)
Unique Values of dayhours are ['20150427T000000' '20150317T000000' '20140820T000000' '20141010T000000' '20150218T000000' '20140709T000000' '20140715T000000' '20140618T000000' '20140721T000000' '20141028T000000' '20140502T000000' '20150123T000000' '20141029T000000' '20150311T000000' '20140822T000000' '20140522T000000' '20140819T000000' '20150428T000000' '20150212T000000' '20140909T000000' '20141125T000000' '20140818T000000' '20140719T000000' '20141230T000000' '20150112T000000' '20140829T000000' '20150501T000000' '20140811T000000' '20141103T000000' '20140620T000000' '20141203T000000' '20141030T000000' '20141208T000000' '20140903T000000' '20140610T000000' '20150422T000000' '20140625T000000' '20150222T000000' '20150113T000000' '20140923T000000' '20140624T000000' '20150128T000000' '20141014T000000' '20150316T000000' '20140912T000000' '20140606T000000' '20141017T000000' '20140827T000000' '20150225T000000' '20140707T000000' '20150403T000000' '20140712T000000' '20140603T000000' '20140508T000000' '20141009T000000' '20141027T000000' '20140801T000000' '20150304T000000' '20150310T000000' '20140905T000000' '20150205T000000' '20140612T000000' '20141007T000000' '20150507T000000' '20141126T000000' '20140804T000000' '20140714T000000' '20140922T000000' '20141218T000000' '20140814T000000' '20140626T000000' '20141008T000000' '20140815T000000' '20140812T000000' '20140711T000000' '20140904T000000' '20141121T000000' '20140520T000000' '20150409T000000' '20150325T000000' '20141211T000000' '20140627T000000' '20141204T000000' '20141006T000000' '20150312T000000' '20140506T000000' '20150130T000000' '20140613T000000' '20150211T000000' '20141107T000000' '20141120T000000' '20140708T000000' '20141114T000000' '20141118T000000' '20150303T000000' '20150408T000000' '20150209T000000' '20150207T000000' '20150213T000000' '20141205T000000' '20140530T000000' '20150115T000000' '20140623T000000' '20141216T000000' '20150129T000000' '20141021T000000' '20150402T000000' '20140617T000000' '20150224T000000' '20140604T000000' '20150320T000000' '20140924T000000' '20150410T000000' '20140619T000000' '20150227T000000' '20140926T000000' '20150309T000000' '20140723T000000' '20140710T000000' '20141210T000000' '20140602T000000' '20141113T000000' '20140825T000000' '20140523T000000' '20141023T000000' '20150504T000000' '20140917T000000' '20150321T000000' '20140908T000000' '20150106T000000' '20150120T000000' '20140726T000000' '20150407T000000' '20140806T000000' '20140724T000000' '20140808T000000' '20150424T000000' '20141013T000000' '20150414T000000' '20140519T000000' '20140512T000000' '20150508T000000' '20150324T000000' '20140826T000000' '20141222T000000' '20150423T000000' '20150326T000000' '20141229T000000' '20150401T000000' '20150226T000000' '20141215T000000' '20141001T000000' '20140701T000000' '20140813T000000' '20140718T000000' '20150327T000000' '20141217T000000' '20140609T000000' '20141124T000000' '20150220T000000' '20140513T000000' '20141105T000000' '20141002T000000' '20150406T000000' '20141112T000000' '20150420T000000' '20140821T000000' '20150331T000000' '20140509T000000' '20140729T000000' '20141110T000000' '20141206T000000' '20140611T000000' '20141202T000000' '20140725T000000' '20150511T000000' '20141022T000000' '20150122T000000' '20140731T000000' '20140630T000000' '20140717T000000' '20150305T000000' '20150223T000000' '20140507T000000' '20150313T000000' '20150318T000000' '20141024T000000' '20150330T000000' '20141223T000000' '20150306T000000' '20150319T000000' '20150116T000000' '20150421T000000' '20141201T000000' '20141015T000000' '20150430T000000' '20140911T000000' '20141016T000000' '20140805T000000' '20150219T000000' '20141020T000000' '20140521T000000' '20140915T000000' '20150505T000000' '20150107T000000' '20150512T000000' '20150323T000000' '20150102T000000' '20150513T000000' '20140515T000000' '20140527T000000' '20150124T000000' '20150429T000000' '20140605T000000' '20140910T000000' '20150426T000000' '20140505T000000' '20140902T000000' '20141119T000000' '20150121T000000' '20140616T000000' '20150126T000000' '20140702T000000' '20140514T000000' '20150416T000000' '20140925T000000' '20150506T000000' '20150417T000000' '20141212T000000' '20141003T000000' '20140529T000000' '20140828T000000' '20150413T000000' '20141219T000000' '20150524T000000' '20150127T000000' '20140703T000000' '20140930T000000' '20140516T000000' '20150302T000000' '20141111T000000' '20140807T000000' '20150301T000000' '20140730T000000' '20140706T000000' '20140919T000000' '20141226T000000' '20150204T000000' '20141031T000000' '20141231T000000' '20140918T000000' '20140528T000000' '20140608T000000' '20140716T000000' '20140916T000000' '20141104T000000' '20150105T000000' '20140929T000000' '20150202T000000' '20141224T000000' '20150206T000000' '20140722T000000' '20150108T000000' '20141117T000000' '20150509T000000' '20150425T000000' '20150114T000000' '20141128T000000' '20140526T000000' '20141106T000000' '20140728T000000' '20150214T000000' '20140504T000000' '20140518T000000' '20141220T000000' '20150415T000000' '20150228T000000' '20140906T000000' '20150109T000000' '20141122T000000' '20140628T000000' '20141209T000000' '20150203T000000' '20150131T000000' '20141011T000000' '20140524T000000' '20150125T000000' '20150419T000000' '20150514T000000' '20140810T000000' '20150210T000000' '20140817T000000' '20141018T000000' '20140920T000000' '20150217T000000' '20141123T000000' '20141207T000000' '20150503T000000' '20140809T000000' '20140921T000000' '20141101T000000' '20150119T000000' '20141109T000000' '20140713T000000' '20150412T000000' '20140601T000000' '20140629T000000' '20141129T000000' '20140615T000000' '20150411T000000' '20140621T000000' '20140803T000000' '20150322T000000' '20140901T000000' '20140927T000000' '20140824T000000' '20150510T000000' '20150328T000000' '20150329T000000' '20150404T000000' '20140816T000000' '20140704T000000' '20140913T000000' '20140525T000000' '20140614T000000' '20150307T000000' '20150502T000000' '20140720T000000' '20140531T000000' '20140622T000000' '20140727T000000' '20140510T000000' '20140705T000000' '20141214T000000' '20140928T000000' '20140831T000000' '20141227T000000' '20140802T000000' '20140823T000000' '20141004T000000' '20141026T000000' '20141108T000000' '20140914T000000' '20150216T000000' '20150405T000000' '20141012T000000' '20150314T000000' '20150418T000000' '20150201T000000' '20150315T000000' '20141213T000000' '20141025T000000' '20150515T000000' '20140607T000000' '20141116T000000' '20140503T000000' '20140511T000000' '20140907T000000' '20150110T000000' '20150221T000000' '20141221T000000' '20141005T000000' '20150215T000000' '20140517T000000' '20150527T000000' '20150117T000000' '20141102T000000' '20141115T000000' '20150308T000000' '20141019T000000' '20141130T000000' '20140830T000000'] **************************************************************************************************** Unique Values of price are [600000. 190000. 735000. ... 725126. 332100. 685530.] **************************************************************************************************** Unique Values of room_bed are [ 4. 2. 3. 1. 5. 6. nan 7. 10. 8. 0. 9. 33. 11.] **************************************************************************************************** Unique Values of room_bath are [1.75 1. 2.75 2.5 1.5 3.5 2. 2.25 3. 4. 3.25 3.75 nan 5. 0.75 5.5 4.25 4.5 4.75 8. 6.75 5.25 6. 0. 1.25 5.75 7.5 6.5 0.5 7.75 6.25] **************************************************************************************************** Unique Values of living_measure are [3050. 670. 3040. ... 1405. 1295. 2253.] **************************************************************************************************** Unique Values of lot_measure are [ 9440. 3101. 2415. ... 12369. 2332. 60467.] **************************************************************************************************** Unique Values of ceil are [1.0 2.0 3.0 1.5 2.5 '$' nan 3.5] **************************************************************************************************** Unique Values of coast are [0.0 1.0 '$' nan] **************************************************************************************************** Unique Values of sight are [ 0. 4. 2. 3. 1. nan] **************************************************************************************************** Unique Values of condition are [3.0 4.0 5.0 2.0 nan 1.0 '$'] **************************************************************************************************** Unique Values of quality are [ 8. 6. 7. 10. 9. 5. 11. 13. 4. 12. 1. 3. nan] **************************************************************************************************** Unique Values of ceil_measure are [1800. 670. 3040. 1740. 1120. 1610. 1140. 3360. 3100. 1460. 3310. 1520. 2450. 1390. 910. 1450. 1040. 1200. 1570. 1940. 2170. 1750. 1430. 1220. 2790. 720. 1180. 950. 2470. 3220. 900. 1420. 1870. 2920. 1100. 1440. 2360. 3000. 1130. 2156. 1170. 1300. 1340. 2820. 2290. 1070. 800. 1550. 2200. 1230. 2305. 1270. 1540. 1890. 1060. 1620. 2670. 1980. 1760. 1150. 1470. 1160. 2270. 3500. 1950. 750. 3290. 3240. 1320. 1769. 1790. 830. 1290. 1010. 1910. 2110. 2460. 3550. 1330. 1280. 1480. 2040. 2150. 3140. 1370. 1400. 2740. 2020. 2530. 1260. 3370. 870. 1500. 1240. 1810. 2190. 890. 630. 1410. 2180. 3460. 4160. 1830. 1380. 1530. 2610. 3180. 880. 1690. 1930. 1190. 1600. 2240. 2210. 2750. 3440. 1000. 5110. 1757. 1900. 1050. 3480. 740. 3820. 1990. 970. 1490. 3080. 2390. 2680. 2100. 620. 7320. 2050. 860. 6050. 1920. 2580. 1210. 810. 1680. 1080. 1560. 700. 840. 1030. 4140. 2590. 1250. 1660. 710. 730. 1970. 3950. 1590. 1310. 1580. 2990. 1770. 1350. 780. 2070. 2400. 2120. 2250. 1020. 2420. 3960. 4000. 590. 2090. 3600. 2660. 2080. 1650. 980. 3190. 3110. 2380. 1640. 2440. 1670. 2570. 1360. 960. 4360. 3350. 3300. 3320. 820. 1730. 2330. 2000. 640. 2340. 2641. 4480. 2410. 2507. 2540. 1090. 3720. 1824. 2550. 2840. 3630. 3380. 1840. 4320. 3150. 460. 3270. 3330. 2230. 4850. 1820. 3700. 2168. 3010. 2910. 770. 990. 1780. 1700. 920. 2430. 3520. 2650. 2370. 3410. 3200. 4060. 3260. 1110. 3060. 2960. 2130. 1720. 3030. 2280. 2890. 2870. 2600. 2710. 4280. 2930. 5190. 2800. 3800. 3810. 4230. 3840. 940. 2830. 2510. 1630. 2480. 3540. 2160. 1510. 2350. 2320. 2720. 4620. 2770. 1960. 2475. 3900. 4390. 2520. 3750. 765. 2630. 660. 930. 4560. 3470. 2690. 4790. 2760. 850. 3860. 1656. 3400. 3590. 2640. 790. 1850. 3730. 680. 2780. 3020. 1710. 4960. 1088. 2700. 1392. 2860. 2900. 2810. 3430. 1880. 2030. 3620. 3510. 3340. 760. 1983. 2730. 5020. 2310. 2300. 4115. 4760. 3070. 4370. 6720. 4430. 3090. 2260. 8570. 3910. 3223. 2010. 3830. 2220. 2849. 5400. 1678. 2155. 1676. 3680. 3390. 5080. 3660. 3570. 1646. 4820. 550. 3780. 2653. 4720. 1766. 2940. 1995. 2669. 2732. 1833. 2140. 2850. 1860. 2560. 2060. 3530. 2714. 1095. 3250. 4270. 1502. 2980. 1255. 3420. 2403. 3222. 4180. 2196. 2620. 3160. 3670. 3650. 4750. 2692. 4040. 2811. 4150. 2025. 6430. 3206. 4110. 3230. 2632. 1413. 3120. 580. 4930. 2970. 1651. 4083. 1479. 4290. 2950. 4210. 2166. 3170. 570. 2803. 600. 3610. 2031. 4590. 1679. 4130. 480. 4440. 3280. 1419. 5330. 2885. 1414. 1654. 2303. 1785. 3580. 3450. 2708. 2891. 2500. 2244. 610. 1564. 2880. 5130. 3560. 3595. 4070. 470. 3940. 3605. 2286. 3690. 4120. 3980. 4570. 5480. 3485. 1363. 2490. 3640. 2153. 2242. 3790. 3130. 3880. 4170. 2689. 1715. 1802. 5230. 2789. 4810. 4350. 4260. 1763. 4660. 1159. 1608. 1427. 4890. 2544. 1494. 1889. 4010. 2744. 3210. 290. 4470. 1864. 2623. 4100. 2718. 4610. 5180. 1002. 3336. 7880. 4030. 1961. 490. 2398. 3930. 2755. 1552. 1578. 1934. 3181. 4410. 4250. 1765. 4740. 4090. 1347. 2393. 3065. 2009. 690. 3050. 6660. 3504. 650. 1987. 3770. 2375. 4475. 4900. 3172. 5000. 1752. 2185. 2283. 2434. 4950. 4300. 1852. 1358. 1584. 2738. 1116. 4310. 2502. 806. 3920. 4340. 5440. 2807. 5060. 2496. 1352. 2298. 2095. 4500. 1175. 520. 3870. 4190. 2734. 1322. 4080. 5430. 3970. 3266. 3569. 2267. 1131. 2201. 1954. 1313. 4380. 3006. 3361. 1778. 560. 1408. 3276. 4020. 1355. 1652. 4050. 3850. 2432. 6370. 1689. 3740. 1122. 798. 1129. 1808. 2659. 1867. 2683. 998. 2154. 4400. 5320. 3490. 1794. 370. 2655. 2301. 1996. 5300. 2864. 2029. 1981. 2414. 1008. 2612. 2726. 7420. 4670. 2052. 4240. 4580. 1509. 1914. 3760. 3028. 2815. 1239. 1975. 5050. 3845. 4540. 2105. 5240. 1728. 2481. 1232. 1341. 530. 4220. 3281. 1658. 4910. 7850. 9410. 2093. 1899. 1657. 3526. 5990. 3831. 1048. 3710. 1068. 1747. 2575. 1384. 1982. 3695. 5370. 1544. 3488. 6110. 4640. 1288. 2979. 4460. 1912. 2007. 1798. 4700. 3316. 2555. 2906. 1847. 809. 1604. 2441. 2217. 1453. 2026. 2223. 2671. 1812. 5710. 4840. 390. 4780. 2038. 2074. 4490. 2961. 2075. 2927. 6350. 2229. 4860. 4980. 1256. 3890. 2828. 5830. 2456. 3855. 2437. 2495. 3087. 1677. 3990. 2425. 2588. 3192. 1076. 2483. 1446. 2675. 8860. 5250. 2478. 7680. 4073. 2145. 5153. 3004. 430. 1264. 2529. 3906. 1165. 3828. 2643. 6220. 2796. 500. 1553. 1275. 1405. 2587. 2406. 1072. 2423. 2245. 380. 2658. 4200. 5760. 2424. 3045. 1788. 4225. 6070. 3135. 4830. 2966. 540. 893. 1144. 1444. 1295. 440. 2678. 4510. 2701. 1746. 3284. 2019. 2519. 6200. 3148. 2783. 4770. 5844. 1484. 1333. 5403. 3905. 2846. 2473. 5160. 1605. 2192. 2533. 1964. 2517. 1315. 2795. 5550. 5070. 1108. 2578. 2299. 2905. 1615. 1936. 2056. 1481. 2163. 4065. 963. 1092. 2024. 2538. 1572. 4940. 420. 1613. 6420. 3527. 3216. 4630. 2584. 5040. 2044. 2134. 2506. 1946. 1643. 3859. 1381. 2716. 1845. 384. 2497. 3056. 2594. 1767. 2542. 2216. 1061. 2341. 5450. 2963. 3202. 1329. 1216. 1491. 1397. 992. 1741. 2099. 2665. 2208. 2557. 2313. 902. 1606. 995. 3691. 1489. 1078. 3078. 3362. 3597. 2415. 2798. 1782. 5140. 5220. 4420. 5584. 1435. 2577. 2015. 5670. 1212. 1087. 1984. 3154. 4800. 1921. 2793. 5010. 1611. 2181. 2518. 2311. 2656. 2452. 5610. 6640. 944. 2238. 1536. 1296. 1714. 4600. 2092. 2014. 2932. 1805. 2413. 2115. 6380. 1876. 5980. 2329. 2601. 3555. 2165. 1976. 1094. 1556. 4330. 3002. 2174. 2782. 2717. 2568. 844. 962. 1764. 1248. 3236. 1451. 3745. 2068. 3274. 3052. 2547. 2233. 3915. 2198. 4450. 5090. 2844. 5530. 1084. 1904. 2531. 3118. 6530. 2628. 1425. 1628. 1571. 3136. 2331. 5310. 901. 3674. 1834. 1396. 6120. 6085. 2356. 6090. 1422. 3001. 6290. 2835. 5490. 833. 8020. 3265. 410. 1726. 988. 3064. 1528. 2598. 2448. 2395. 828. 1811. 866. nan 894. 2382. 4285. 1595. 1463. 4133. 2064. 5770. 4870. 1105. 2253.] **************************************************************************************************** Unique Values of basement are [1250. 0. 1320. 1000. 480. 610. 1050. 700. 430. 560. 250. 670. 570. 290. 600. 680. 380. 50. 1020. 690. 1010. 530. 1370. 1040. 790. 910. 820. 1850. 500. 760. 960. 340. 800. 580. 1600. 1680. 900. 420. 450. 200. 240. 950. 1590. 1220. 1500. 710. 80. 140. 1260. 860. 890. 280. 440. 880. 220. 1650. 630. 780. 810. 300. 720. 470. 150. 1180. 1060. 120. 660. 400. 1100. 1780. 640. 1170. 1890. 130. 550. 360. 940. 650. 2730. 870. 730. 1350. 1530. 1540. 620. 1080. 1900. 770. 520. 920. 1110. 830. 1420. 980. 190. 330. 350. 740. 1570. 990. 1390. 260. 540. 1300. 265. 1120. 460. 370. 1830. 1140. 270. 145. 510. 750. 1710. 930. 1870. 1200. 310. 850. 506. 970. 1070. 1450. 840. 90. 3500. 1380. 1090. 1280. 1240. 3480. 1210. 1690. 60. 1800. 2400. 180. 4820. 110. 1030. 2060. 143. 1400. 100. 1270. 2040. 1360. 1740. 590. 1150. 40. 1990. 1340. 1700. 160. 1290. 1190. 1630. 946. 1230. 1430. 2600. 390. 1620. 410. 1950. 1160. 1135. 320. 210. 1460. 170. 1490. 1330. 1760. 207. 2300. 1410. 2090. 1810. 1660. 1940. 3260. 1640. 894. 1440. 2200. 1130. 2010. 1790. 490. 1550. 1560. 230. 70. 276. 417. 652. 2000. 283. 1580. 1670. 1310. 1720. 2390. 2100. 374. 414. 2620. 176. 1910. 515. 1730. 1820. 2080. 666. 1480. 861. 1520. 1470. 1816. 518. 784. 10. 2110. 2050. 4130. 1008. 2330. 2030. 516. 704. 2580. 915. 172. 1510. 602. 2550. 1610. 1284. 1281. 2170. 1798. 2240. 2070. 1930. 1880. 2020. 508. 295. 2360. 2720. 2160. 435. 225. 2220. 1860. 1840. 2590. 2130. 2490. 862. 3000. 2310. 2150. 556. 1852. 475. 1548. 1960. 235. 2610. 875. 1024. 2190. 415. 792. 768. 1248. 1275. 20. 2850. 1525. 2120. 1913. 2250. 65. 1770. 1750. 2570. 2500. 588. 266. 2350. 1481. 274. 248. 935. 1245. 2196. 243. 2810. nan 906. 1920. 2180.] **************************************************************************************************** Unique Values of yr_built are [1966.0 1948.0 2009.0 1924.0 1994.0 2005.0 1978.0 1983.0 2012.0 1912.0 1990.0 1967.0 1919.0 1908.0 1950.0 2000.0 2013.0 1943.0 1922.0 1977.0 2004.0 1935.0 1964.0 1945.0 1987.0 2008.0 1940.0 2003.0 1988.0 1985.0 1998.0 1995.0 1946.0 1984.0 1958.0 1963.0 1942.0 2014.0 1971.0 1936.0 1954.0 1923.0 2002.0 1972.0 2007.0 1930.0 1962.0 1999.0 1953.0 1965.0 2010.0 1997.0 2006.0 1979.0 1996.0 1992.0 1968.0 1980.0 1981.0 1969.0 2001.0 1929.0 1952.0 1916.0 1976.0 1974.0 1920.0 1931.0 1975.0 1960.0 1900.0 '$' 1986.0 1989.0 1906.0 1955.0 1956.0 1915.0 1941.0 1993.0 2011.0 1925.0 1947.0 1991.0 1926.0 1927.0 1951.0 1961.0 1932.0 1917.0 1928.0 1959.0 1921.0 1911.0 1949.0 1982.0 1913.0 1957.0 1914.0 1938.0 1973.0 1937.0 1944.0 1970.0 1901.0 1907.0 1939.0 1918.0 1934.0 1904.0 2015.0 1909.0 1910.0 1905.0 1902.0 1933.0 1903.0 nan] **************************************************************************************************** Unique Values of yr_renovated are [ 0. 1993. 2014. 1983. 1992. 2000. 2011. 1994. 2009. 1944. 1971. 2003. 1955. 1985. 2008. 2015. 2005. 1979. 1998. 1968. 2010. 1989. 2002. 1987. 1999. 1996. 1940. 1986. 1988. 1969. 1995. 2004. 2007. 2013. 2001. 1990. 1958. 2012. 1967. 1991. 1970. 1984. 2006. 1982. 1951. 1960. 1956. 1997. 1980. 1959. 1974. 1973. 1975. 1981. 1963. 1957. 1976. 1948. 1945. 1977. 1978. 1972. 1965. 1964. 1953. 1950. 1962. 1946. 1934. 1954.] **************************************************************************************************** Unique Values of zipcode are [98034. 98118. 98002. 98030. 98103. 98006. 98042. 98031. 98065. 98109. 98058. 98001. 98105. 98115. 98032. 98033. 98199. 98053. 98056. 98102. 98038. 98092. 98003. 98075. 98059. 98008. 98011. 98014. 98023. 98116. 98198. 98126. 98052. 98108. 98133. 98074. 98077. 98106. 98045. 98146. 98155. 98117. 98027. 98040. 98072. 98005. 98055. 98070. 98028. 98166. 98019. 98136. 98107. 98004. 98125. 98112. 98024. 98177. 98122. 98168. 98029. 98007. 98178. 98010. 98188. 98039. 98144. 98022. 98148. 98119.] **************************************************************************************************** Unique Values of lat are [47.7228 47.5546 47.5188 ... 47.4178 47.7594 47.3915] **************************************************************************************************** Unique Values of long are [-122.183 -122.274 -122.256 -122.213 -122.285 '$' -122.333 -122.165 -122.15 -122.178 -121.87 -122.352 -122.122 -122.275 -122.234 -122.324 -122.321 -122.277 -122.196 -122.398 -122.019 -122.181 -122.325 -122.191 -122.026 -122.084 -122.3 -122.287 -122.011 -122.127 -122.281 -122.221 -121.859 -121.913 -122.39 -121.871 -122.013 -122.381 -122.279 -122.194 -122.201 -122.125 -122.379 -122.121 -122.024 -122.389 -122.305 -122.346 -122.375 -122.344 -122.028 -122.193 -122.209 -122.365 -122.28 -122.334 -122.149 -122.388 -122.016 -122.298 -122.153 -121.802 -121.878 -122.218 -122.359 -122.296 -122.361 -122.357 -122.309 -122.056 -121.974 -122.18 -122.331 -122.374 -121.976 -121.975 -122.216 -122.188 -122.391 -122.31 -122.302 -121.972 -122.22 -122.17 -122.101 -121.888 -122.046 -122.154 -122.145 -122.236 -122.168 -122.469 -122.248 -122.412 -122.356 -121.908 -122.048 -122.147 -122.017 -121.955 -122.169 -122.392 -122.081 -122.37 -122.21 -122.049 -122.266 -122.327 -122.164 -122.231 -122.115 -122.044 -122.343 -122.372 -122.349 -122.364 -122.232 -122.308 -122.091 -122.03 -122.176 -122.124 -122.163 -122.151 -122.143 -122.215 -122.214 -122.284 -122.316 -122.228 -122.351 -121.986 -122.118 -122.304 -122.312 -122.219 -122.039 -122.053 -122.409 -122.041 -122.354 -122.148 -122.369 -122.288 -122.326 -122.172 -122.078 -122.059 -122.086 -122.119 -122.32 -122.315 -122.406 -121.943 -122.088 -122.363 -122.303 -122.004 -122.162 -122.317 -122.38 -122.323 -122.376 -121.96 -122.141 -122.226 -122.082 -122.047 -122.341 -122.267 -122.071 -122.328 -121.999 -122.195 -122.273 -122.02 -121.872 -121.912 -121.948 -122.396 -122.138 -122.311 -121.876 -122.366 -122.032 -122.199 -122.368 -122.306 -122.318 -122.037 -121.997 -122.252 -122.297 -122.353 -122.142 -122.179 -122.435 -122.197 -122.036 -122.129 -122.076 -122.295 -122.36 -121.917 -122.055 -122.023 -122.35 -122.385 -122.278 -122.073 -122.025 -122.286 -122.126 -122.205 -122.018 -122.265 -122.342 -122.319 -121.763 -122.347 -122.2 -122.206 -122.015 -122.212 -122.198 -121.982 -122.11 -121.867 -122.031 -122.186 -121.774 -122.112 -122.301 -122.415 -122.339 -122.003 -121.769 -122.24 -122.335 -122.062 -122.185 -122.294 -122.184 -122.203 -121.772 -121.91 -122.099 -121.966 -122.348 -122.27 -122.397 -122.19 -122.144 -122.111 -122.116 -122.133 -121.992 -122.113 -122.174 -122.085 -122.225 -122.07 -122.332 -122.393 -122.146 -122.399 -122.128 -122.386 -122.152 -122.247 -122.217 -122.158 -122.307 -122.367 -122.322 -122.283 -122.202 -122.107 -122.134 -121.967 -121.987 -122.097 -122.242 -122.394 -121.854 -122.075 -122.13 -122.16 -122.095 -122.268 -122.282 -122.329 -122.337 -122.057 -122.167 -121.926 -121.744 -121.991 -122.29 -121.864 -122.314 -122.362 -122.383 -122.292 -122.211 -122.239 -122.033 -122.001 -122.276 -122.061 -122.08 -122.102 -122.034 -121.714 -122.108 -122.43 -121.909 -122.269 -122.224 -122.132 -122.058 -122.077 -121.97 -121.77 -122.403 -122.06 -122.012 -122.405 -121.929 -122.14 -122.299 -122.187 -122.089 -122.26 -122.223 -122.182 -122.204 -122.0 -122.021 -122.355 -122.313 -121.847 -122.253 -122.175 -121.863 -122.109 -122.103 -122.373 -121.995 -122.387 -122.038 -122.395 -122.083 -122.135 -122.338 -122.227 -122.029 -122.027 -121.996 -122.043 -122.378 -122.291 -122.382 -122.002 -122.009 -122.411 -122.072 -122.229 -122.463 -122.064 -122.262 -121.698 -122.33 -122.384 -122.425 -122.09 -122.208 -122.246 -121.884 -122.371 -122.035 -122.157 -122.293 -122.123 -122.045 -122.189 -121.776 -122.114 -121.892 -122.12 -121.865 -122.237 -122.139 -122.4 -122.453 -121.835 -121.783 -122.065 -121.804 -122.087 -122.006 -122.098 -122.263 -121.977 -122.173 -121.998 -121.964 -122.092 -122.207 -122.259 -122.402 -122.177 -122.093 -122.007 -122.155 -122.264 -121.959 -121.979 -122.042 -122.074 -121.868 -121.981 -122.052 -122.377 -122.289 -121.994 -121.968 -122.34 -121.988 -122.079 -122.05 -122.235 -121.92 -122.161 -121.905 -122.014 -122.136 -122.171 -121.879 -122.358 -121.735 -122.166 -122.255 -121.984 -121.869 -121.75 -122.261 -122.105 -122.243 -122.244 -122.008 -121.787 -122.505 -121.897 -121.927 -122.408 -121.73 -121.944 -122.069 -122.094 -122.022 -122.345 -122.222 -121.956 -122.117 -122.068 -122.054 -121.903 -122.272 -121.853 -122.496 -121.646 -122.497 -122.249 -121.883 -122.271 -121.768 -121.855 -121.983 -122.445 -121.895 -121.985 -122.421 -121.962 -122.413 -121.902 -122.04 -122.063 -121.993 -121.989 -121.734 -121.969 -121.914 -122.461 -122.106 -121.861 -121.707 -122.159 -121.881 -122.336 -121.957 -121.894 -122.192 -122.23 -121.98 -121.882 -121.931 -121.94 -121.723 -121.916 -121.907 -122.241 -122.475 -122.104 -122.444 -121.973 -122.066 -122.257 -122.1 -122.137 -122.156 -121.904 -122.472 -121.896 -122.005 -121.8 -121.932 -121.78 -122.067 -122.051 -122.254 -121.953 -121.857 -121.875 -122.258 -122.401 -122.45 -122.01 -122.233 -122.245 -121.922 -121.963 -121.85 -122.447 -122.404 -121.823 -121.934 -122.44 -121.739 -122.519 -121.89 -121.778 -122.131 -122.251 -121.887 -121.757 -121.745 -121.949 -121.841 -121.775 -121.886 -121.911 -121.88 -121.874 -121.829 -121.899 -121.901 -121.965 -121.711 -121.862 -121.866 -121.978 -121.925 -122.238 -121.845 -121.971 -121.737 -121.76 -121.701 -121.771 -121.893 -121.762 -121.828 -121.786 -121.898 -121.719 -121.822 -121.767 -121.9 -121.731 -121.82 -122.473 -122.47 -121.773 -121.756 -121.724 -121.99 -122.25 -121.417 -122.459 -121.924 -121.833 -121.951 -122.41 -121.891 -121.918 -121.936 -122.504 -121.403 -121.952 -122.482 -121.316 -122.414 -121.779 -121.915 -122.446 -122.462 -121.709 -122.502 -122.507 -122.432 -122.438 -121.851 -121.759 -121.781 -121.81 -122.456 -121.764 -121.765 -121.958 -122.407 -121.889 -121.885 -121.873 -122.479 -121.877 -121.954 -121.801 -122.454 -121.782 -122.443 -122.42 -121.746 -121.48 -121.86 -121.846 -121.831 -122.422 -122.455 -121.856 -121.942 -121.352 -121.752 -121.933 -121.935 -122.464 -121.473 -121.743 -122.511 -121.826 -121.758 -121.961 -121.842 -121.93 -122.096 -122.467 -121.359 -122.452 -121.858 -121.946 -121.838 -122.449 -122.474 -122.457 -122.416 -121.799 -122.515 -121.747 -121.906 -121.921 -122.465 -121.733 -121.939 -121.809 -122.509 -122.49 -121.738 -121.364 -121.691 -121.819 -122.431 -121.716 -121.718 -121.748 -121.818 -122.486 -122.46 -122.512 -121.754 -122.506 -121.777 -121.797 -121.721 -121.803 -122.448 -121.95 -121.789 -121.945 -121.676 -121.726 -121.848 -121.736 -122.433 -121.852 -122.503 -121.315 -121.785 -122.514 -122.491 -121.821 -121.755 -121.766 -121.834 -122.439 -121.827 -121.742 -121.937 -121.325 -121.761 -121.784 -121.849 -121.725 -122.451 -122.441 -121.792 -121.815 -121.788 -121.727 -121.795 -121.402 -121.941 -122.458 -121.713 -121.84 -121.319 -122.484 -121.749 -122.499 -121.321 -121.837 -121.708 -121.405 -121.947] **************************************************************************************************** Unique Values of living_measure15 are [2020. 1660. 2620. 2030. 1120. 1610. 1170. 2800. 1850. 1460. 2380. 1880. 2440. 1390. 1060. 1280. 1540. 1200. 3050. 1410. 2300. 1440. 1710. 2850. 1860. 1830. 1550. 2490. 3330. 1350. 2040. 3100. 1790. 2420. 3000. 1870. 1730. 1708. 1670. 1510. 1480. 2820. 1090. 1160. 1220. 2710. 1240. 1320. 2120. 1500. 1450. 2400. 2050. 1380. 2070. 1820. 1620. 1268. 2260. 3110. 1890. 1260. 3380. 1680. 2502. 1340. 1800. 1180. 2650. 2450. 3730. 2110. 2990. 3920. 2330. 2060. 2520. 1330. 1780. 2150. 2790. 2080. 1650. 2530. 1760. 2900. 870. 1770. 2190. 1270. 3140. 1140. 1720. 2214. 3340. 2540. 3400. nan 2610. 1400. 3130. 1530. 1690. 2580. 1990. 3390. 3800. 2140. 910. 1560. 1150. 4110. 2510. 1050. 3360. 1490. 3010. 2130. 3450. 2740. 2200. 2100. 2360. 1580. 3630. 1100. 1020. 1930. 3270. 2310. 1250. 4210. 2170. 2220. 3420. 1940. 1570. 2460. 980. 1810. 2270. 1840. 1470. 920. 1630. 840. 1520. 1590. 1740. 1300. 1010. 1080. 2340. 2430. 3070. 1230. 2350. 3160. 1700. 1980. 1900. 1640. 1420. 2280. 2000. 2250. 1600. 2500. 3290. 2640. 1920. 2700. 1430. 2720. 2320. 1110. 990. 2290. 2410. 1370. 2240. 2180. 3350. 3550. 3170. 2010. 2415. 1130. 2641. 2840. 2750. 3020. 2830. 1290. 3515. 2409. 2910. 1910. 3210. 3250. 5070. 1970. 3470. 2230. 3850. 1960. 1326. 3880. 1750. 810. 2168. 3240. 1190. 1310. 1210. 2880. 2760. 2550. 1070. 3260. 2390. 2090. 2870. 3180. 1352. 3320. 900. 2770. 2630. 1950. 3150. 2940. 2950. 3570. 2160. 2810. 2980. 2780. 2600. 3840. 3200. 3890. 2210. 1360. 3540. 2370. 3640. 1000. 3960. 930. 2475. 3590. 2256. 2660. 1030. 2930. 2730. 3780. 3370. 4010. 5030. 2570. 2560. 3430. 3830. 4020. 4590. 2955. 850. 3670. 4800. 940. 1536. 1365. 1217. 3310. 2890. 4920. 750. 2083. 3950. 3650. 3230. 3740. 4913. 4190. 4600. 2590. 3900. 2470. 3940. 3510. 1979. 2860. 5600. 2960. 1468. 1442. 960. 3480. 3220. 3280. 3190. 1040. 3040. 2155. 3810. 3990. 3625. 970. 1975. 2653. 2970. 3460. 3860. 3720. 2669. 880. 3078. 1466. 950. 2396. 1576. 2680. 2848. 2480. 3530. 1377. 2234. 2255. 1307. 3030. 4362. 3060. 3090. 3300. 3680. 3490. 2670. 1941. 5380. 3610. 2336. 2258. 2920. 3440. 3560. 1664. 3770. 2527. 1792. 4050. 2690. 3870. 3620. 4440. 4240. 1746. 860. 3080. 4390. 2474. 5790. 4040. 4100. 2246. 4660. 3600. 4090. 4130. 4470. 3930. 1414. 720. 2054. 3500. 1654. 2516. 1691. 2077. 1678. 4670. 3568. 1369. 2648. 1445. 4480. 4420. 4700. 2875. 4300. 952. 2434. 3710. 2009. 740. 4560. 2755. 2665. 2125. 890. 2238. 1398. 2547. 2037. 4740. 2688. 2619. 1763. 5080. 1443. 460. 2358. 1494. 4280. 4940. 4220. 3910. 2667. 2165. 3056. 1571. 3790. 4170. 3494. 1961. 4080. 3120. 2004. 1677. 4042. 1525. 1919. 2056. 4760. 4650. 1765. 3410. 2439. 3750. 1137. 780. 4000. 2095. 1528. 1404. 2704. 4320. 3191. 1862. 2297. 1346. 4290. 1884. 2765. 2166. 1358. 1584. 2738. 1277. 3674. 2597. 3980. 2615. 1088. 1304. 3690. 710. 4830. 2028. 1364. 3660. 2496. 2605. 1439. 1805. 4030. 4730. 3970. 3087. 1131. 1714. 830. 3112. 4630. 800. 1408. 4070. 1492. 1448. 1509. 1616. 1092. 2912. 1459. 2517. 998. 4340. 820. 760. 2616. 2683. 2305. 2876. 4640. 3557. 1798. 3820. 4400. 2879. 2344. 2419. 3520. 2029. 1768. 1188. 2242. 1981. 1934. 1132. 5610. 4230. 2052. 1518. 2049. 2005. 2724. 1336. 1813. 4950. 2583. 2901. 1544. 2315. 3700. 3760. 2363. 1638. 1232. 1716. 4770. 2598. 1658. 4930. 6210. 4180. 4310. 4620. 4850. 3639. 4690. 1078. 2575. 4160. 4250. 4060. 4410. 4570. 2555. 4540. 2189. 3580. 2873. 3715. 4495. 2844. 4530. 1767. 1356. 3543. 2996. 2488. 4443. 2014. 1168. 2634. 4140. 1138. 1175. 4510. 1894. 2967. 2112. 4200. 4260. 1309. 2513. 5340. 2478. 2437. 2144. 2451. 2927. 2236. 1458. 2725. 2106. 2389. 1076. 1943. 2175. 2279. 3721. 1425. 690. 1399. 1264. 2533. 2027. 2673. 1415. 3836. 2406. 3618. 2019. 1564. 5220. 1495. 1834. 2495. 2961. 4330. 2578. 2381. 3045. 4750. 3038. 3008. 1554. 1256. 4350. 1321. 399. 2566. 806. 1357. 3413. 1484. 2075. 2221. 3402. 790. 1572. 2403. 1914. 1757. 2424. 1285. 2647. 2142. 5170. 1481. 2586. 1608. 2405. 2815. 1802. 2015. 1427. 5200. 2554. 1156. 2767. 3736. 1381. 2622. 2849. 2981. 1949. 2303. 2216. 2425. 3425. 2002. 3159. 2099. 1855. 1125. 620. 2091. 700. 1566. 2323. 1728. 1516. 4610. 3726. 2156. 3193. 1303. 2412. 4780. 2822. 4900. 1128. 2594. 2136. 1639. 4120. 1921. 4550. 2793. 2253. 2333. 2458. 1984. 2728. 2456. 1537. 2092. 1216. 1876. 2154. 2518. 4890. 1056. 1302. 1696. 1522. 4270. 5110. 4520. 770. 4680. 1665. 3236. 1162. 1569. 2316. 2798. 1098. 5330. 2198. 2273. 2304. 1084. 2604. 1228. 1886. 1847. 1282. 2109. 5500. 2697. 1955. 2441. 1745. 2354. 2574. 4460. 2197. 3335. 1463. 6110. 2384. 1815. 2076. 2114. 4150. 4370. 1502. 828. 2326. 2011. 1811. 2382. 4490. 670. 1546. 1429. 1405. 2612. 5000. 1295. 1786.] **************************************************************************************************** Unique Values of lot_measure15 are [ 8660. 4100. 2433. ... 11491. 2853. 7604.] **************************************************************************************************** Unique Values of furnished are [ 0. 1. nan] **************************************************************************************************** Unique Values of total_area are [12490.0 3771.0 5455.0 ... 16111.0 63597.0 38122.0] ****************************************************************************************************
df[df['room_bed'].isin([10,11,33])]
| dayhours | price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | zipcode | lat | long | living_measure15 | lot_measure15 | furnished | total_area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2209 | 20141029T000000 | 650000.000 | 10.000 | 2.000 | 3610.000 | 11914.000 | 2.000 | 0.000 | 0.000 | 4.000 | 7.000 | 3010.000 | 600.000 | 1958.000 | 0.000 | 98006.000 | 47.571 | -122.175 | 2040.000 | 11914.000 | 0.000 | 15524.000 |
| 2557 | 20141229T000000 | 660000.000 | 10.000 | 3.000 | 2920.000 | 3745.000 | 2.000 | 0.000 | 0.000 | 4.000 | 7.000 | 1860.000 | 1060.000 | 1913.000 | 0.000 | 98105.000 | 47.663 | -122.320 | 1810.000 | 3745.000 | 0.000 | 6665.000 |
| 14140 | 20140814T000000 | 1150000.000 | 10.000 | 5.250 | 4590.000 | 10920.000 | 1.000 | 0.000 | 2.000 | 3.000 | 9.000 | 2500.000 | 2090.000 | 2008.000 | 0.000 | 98004.000 | 47.586 | -122.113 | 2730.000 | 10400.000 | 1.000 | 15510.000 |
| 16913 | 20140625T000000 | 640000.000 | 33.000 | 1.750 | 1620.000 | 6000.000 | 1.000 | 0.000 | 0.000 | 5.000 | 7.000 | 1040.000 | 580.000 | 1947.000 | 0.000 | 98103.000 | 47.688 | -122.331 | 1330.000 | 4700.000 | 0.000 | 7620.000 |
| 20972 | 20140821T000000 | 520000.000 | 11.000 | 3.000 | 3000.000 | 4960.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 2400.000 | 600.000 | 1918.000 | 1999.000 | 98106.000 | 47.556 | -122.363 | 1420.000 | 4960.000 | 0.000 | 7960.000 |
df.loc[df['ceil_measure'].between(1330.000, 2730.000), ['ceil_measure','room_bed']].describe()
| ceil_measure | room_bed | |
|---|---|---|
| count | 11208.000 | 11153.000 |
| mean | 1866.118 | 3.519 |
| std | 395.435 | 0.797 |
| min | 1330.000 | 0.000 |
| 25% | 1520.000 | 3.000 |
| 50% | 1780.000 | 3.000 |
| 75% | 2170.000 | 4.000 |
| max | 2730.000 | 11.000 |
df.replace('$', np.nan, inplace = True)
df['room_bed'] = df['room_bed'].replace([10,11,33], np.nan)
df[(df['room_bath']<1) | (df['room_bed']==0)]
| dayhours | price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | zipcode | lat | long | living_measure15 | lot_measure15 | furnished | total_area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 345 | 20140522T000000 | 275000.000 | 1.000 | 0.750 | 1170.000 | 14149.000 | 1.000 | 0.000 | 0.000 | 5.000 | 7.000 | 880.000 | 290.000 | 1962.000 | 0.000 | 98022.000 | 47.265 | -121.910 | 1130.000 | 24513.000 | 0.000 | 15319.000 |
| 886 | 20140610T000000 | 350000.000 | 2.000 | 0.750 | 1392.000 | 43710.000 | 1.500 | 0.000 | 0.000 | 4.000 | 7.000 | 1392.000 | 0.000 | 1978.000 | 0.000 | 98070.000 | 47.449 | -122.453 | 1640.000 | 99316.000 | 0.000 | 45102.000 |
| 2181 | 20141123T000000 | 180250.000 | 2.000 | 0.750 | 900.000 | 9600.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 900.000 | 0.000 | 1941.000 | 0.000 | 98166.000 | 47.460 | -122.339 | 1250.000 | 14280.000 | 0.000 | 10500.000 |
| 2195 | 20150220T000000 | 132500.000 | 3.000 | 0.750 | 850.000 | 8573.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 600.000 | 250.000 | 1945.000 | 0.000 | 98146.000 | 47.503 | -122.356 | 850.000 | 8382.000 | 0.000 | 9423.000 |
| 2381 | 20140812T000000 | 785000.000 | 2.000 | 0.750 | 1260.000 | 4800.000 | 1.500 | 0.000 | 2.000 | 4.000 | 6.000 | 1080.000 | 180.000 | 1942.000 | 0.000 | 98033.000 | 47.684 | -122.212 | 2660.000 | 7200.000 | 0.000 | 6060.000 |
| 2605 | 20150219T000000 | 156000.000 | 1.000 | 0.750 | 470.000 | 15000.000 | 1.000 | 0.000 | 0.000 | 3.000 | 4.000 | 470.000 | 0.000 | 1947.000 | 0.000 | 98014.000 | 47.655 | -121.908 | 1730.000 | 22500.000 | 0.000 | 15470.000 |
| 2783 | 20140521T000000 | 360000.000 | 2.000 | 0.750 | 850.000 | 7710.000 | 1.000 | 0.000 | 2.000 | 5.000 | 6.000 | 550.000 | 300.000 | 1909.000 | 0.000 | 98108.000 | 47.559 | -122.301 | 2500.000 | 6022.000 | 0.000 | 8560.000 |
| 2796 | 20140627T000000 | 528000.000 | 2.000 | 0.750 | 840.000 | 40642.000 | 1.000 | 1.000 | 4.000 | 4.000 | 6.000 | 840.000 | 0.000 | 1937.000 | 0.000 | 98070.000 | 47.404 | -122.447 | 1850.000 | 64069.000 | 0.000 | 41482.000 |
| 3155 | 20140624T000000 | 1300000.000 | 0.000 | 0.000 | 4810.000 | 28008.000 | 2.000 | 0.000 | 0.000 | 3.000 | 12.000 | 4810.000 | 0.000 | 1990.000 | 0.000 | 98053.000 | 47.664 | -122.069 | 4740.000 | 35061.000 | 1.000 | 32818.000 |
| 3160 | 20140623T000000 | 402101.000 | 2.000 | 0.750 | 1020.000 | 1350.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1020.000 | 0.000 | 2009.000 | 0.000 | 98144.000 | 47.594 | -122.299 | 1020.000 | 2007.000 | 0.000 | 2370.000 |
| 3286 | 20141117T000000 | 339950.000 | 0.000 | 2.500 | 2290.000 | 8319.000 | 2.000 | 0.000 | 0.000 | 3.000 | 8.000 | 2290.000 | 0.000 | 1985.000 | 0.000 | 98042.000 | 47.347 | -122.151 | 2500.000 | 8751.000 | 0.000 | 10609.000 |
| 3405 | 20140926T000000 | 142000.000 | 0.000 | 0.000 | 290.000 | 20875.000 | 1.000 | 0.000 | 0.000 | 1.000 | 1.000 | 290.000 | 0.000 | 1963.000 | 0.000 | 98024.000 | 47.531 | -121.888 | 1620.000 | 22850.000 | 0.000 | 21165.000 |
| 4238 | 20141223T000000 | 235000.000 | 0.000 | 0.000 | 1470.000 | 4800.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1470.000 | 0.000 | 1996.000 | 0.000 | 98065.000 | 47.526 | -121.828 | 1060.000 | 7200.000 | 0.000 | 6270.000 |
| 4332 | 20141223T000000 | 520000.000 | 4.000 | 0.750 | 1960.000 | 8277.000 | 1.000 | 1.000 | 4.000 | 4.000 | 7.000 | 1320.000 | 640.000 | 1923.000 | 1986.000 | 98198.000 | 47.365 | -122.325 | 1940.000 | 8402.000 | 0.000 | 10237.000 |
| 4606 | 20150301T000000 | 151000.000 | 2.000 | 0.750 | 720.000 | 5040.000 | 1.000 | 0.000 | 0.000 | 3.000 | 4.000 | 720.000 | 0.000 | 1949.000 | 0.000 | 98106.000 | 47.532 | -122.347 | 1290.000 | 4120.000 | 0.000 | 5760.000 |
| 4730 | 20140918T000000 | 484000.000 | 1.000 | 0.000 | 690.000 | 23244.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 690.000 | 0.000 | 1948.000 | 0.000 | 98053.000 | 47.643 | -121.955 | 1690.000 | 19290.000 | 0.000 | 23934.000 |
| 4985 | 20150505T000000 | 250000.000 | 1.000 | 0.750 | 940.000 | 87120.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 940.000 | 0.000 | 1944.000 | 0.000 | 98019.000 | 47.718 | -121.956 | 1930.000 | 165528.000 | 0.000 | 88060.000 |
| 5276 | 20140513T000000 | 230000.000 | 2.000 | 0.750 | 650.000 | 5360.000 | 1.000 | 0.000 | 0.000 | 4.000 | 5.000 | 650.000 | 0.000 | 1931.000 | 0.000 | 98133.000 | 47.728 | -122.335 | 1110.000 | 6700.000 | 0.000 | 6010.000 |
| 5438 | 20150225T000000 | 262000.000 | 1.000 | 0.750 | 520.000 | 12981.000 | 1.000 | 0.000 | 0.000 | 5.000 | 3.000 | 520.000 | 0.000 | 1920.000 | 0.000 | 98022.000 | 47.208 | -121.995 | 1340.000 | 12233.000 | 0.000 | 13501.000 |
| 6233 | 20150512T000000 | 435000.000 | 2.000 | 0.750 | 750.000 | 16321.000 | 1.000 | 0.000 | 1.000 | 3.000 | 4.000 | 750.000 | 0.000 | 1936.000 | 0.000 | 98034.000 | 47.699 | -122.229 | 3020.000 | 10625.000 | 0.000 | 17071.000 |
| 6238 | 20150224T000000 | 229950.000 | 3.000 | 0.750 | 1030.000 | 12700.000 | 1.000 | 0.000 | 0.000 | 4.000 | 5.000 | 1030.000 | 0.000 | 1944.000 | 0.000 | 98032.000 | 47.388 | -122.236 | 1140.000 | 6955.000 | 0.000 | 13730.000 |
| 6420 | 20140701T000000 | 276000.000 | 1.000 | 0.750 | 370.000 | 1801.000 | 1.000 | 0.000 | 0.000 | 5.000 | 5.000 | 370.000 | 0.000 | 1923.000 | 0.000 | 98117.000 | 47.678 | -122.389 | 1340.000 | 5000.000 | 0.000 | 2171.000 |
| 6468 | 20140523T000000 | 527550.000 | 1.000 | 0.750 | 820.000 | 59677.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 820.000 | 0.000 | 1999.000 | 0.000 | 98065.000 | 47.532 | -121.764 | 1590.000 | 14163.000 | 0.000 | 60497.000 |
| 6526 | 20150423T000000 | 399950.000 | 2.000 | 0.750 | 1330.000 | 2856.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 930.000 | 400.000 | 1916.000 | 0.000 | 98126.000 | 47.567 | -122.370 | 1330.000 | 2856.000 | 0.000 | 4186.000 |
| 7104 | 20150504T000000 | 351000.000 | 1.000 | 0.750 | 930.000 | 6600.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 930.000 | 0.000 | 1924.000 | 0.000 | 98125.000 | 47.713 | -122.296 | 1590.000 | 6600.000 | 0.000 | 7530.000 |
| 7132 | 20150121T000000 | 272000.000 | 1.000 | 0.750 | 1040.000 | 6034.000 | 1.000 | 0.000 | 1.000 | 3.000 | 7.000 | 580.000 | 460.000 | 1991.000 | 0.000 | 98178.000 | 47.508 | -122.251 | 1560.000 | 5650.000 | 0.000 | 7074.000 |
| 7503 | 20150326T000000 | 170000.000 | 1.000 | 0.750 | 850.000 | 5600.000 | 1.000 | 0.000 | 2.000 | 3.000 | 6.000 | 850.000 | 0.000 | 1903.000 | 1994.000 | 98019.000 | 47.765 | -121.480 | 900.000 | 12250.000 | 0.000 | 6450.000 |
| 7596 | 20141203T000000 | 355000.000 | 1.000 | 0.750 | 530.000 | 33278.000 | 1.000 | 0.000 | 2.000 | 4.000 | 4.000 | 530.000 | 0.000 | 1950.000 | 0.000 | 98074.000 | 47.641 | -122.079 | 2830.000 | 14311.000 | 0.000 | 33808.000 |
| 7740 | 20140715T000000 | 115000.000 | 2.000 | 0.750 | 550.000 | 7980.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 550.000 | 0.000 | 1952.000 | 0.000 | 98146.000 | 47.511 | -122.348 | 1330.000 | 7980.000 | 0.000 | 8530.000 |
| 8005 | 20140617T000000 | 142500.000 | 4.000 | 0.750 | 1440.000 | 13300.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 1440.000 | 0.000 | 1948.000 | 0.000 | 98166.000 | 47.476 | -122.337 | 1460.000 | 11100.000 | 0.000 | 14740.000 |
| 8327 | 20140917T000000 | 340000.000 | 2.000 | 0.750 | 1060.000 | 48292.000 | 1.000 | 1.000 | 2.000 | 5.000 | 6.000 | 560.000 | 500.000 | 1947.000 | 0.000 | 98070.000 | 47.428 | -122.511 | 750.000 | 80201.000 | 0.000 | 49352.000 |
| 8340 | 20150218T000000 | 320000.000 | 0.000 | 2.500 | 1490.000 | 7111.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1490.000 | 0.000 | 1999.000 | 0.000 | 98065.000 | 47.526 | -121.826 | 1500.000 | 4675.000 | 0.000 | 8601.000 |
| 8407 | 20140825T000000 | 145000.000 | 1.000 | 0.750 | 480.000 | 9750.000 | 1.000 | 0.000 | 0.000 | 2.000 | 4.000 | 480.000 | 0.000 | 1948.000 | 0.000 | 98146.000 | 47.498 | -122.362 | 1550.000 | 9924.000 | 0.000 | 10230.000 |
| 8560 | 20150318T000000 | 247500.000 | 3.000 | 0.750 | 1300.000 | 72309.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 680.000 | 620.000 | 1950.000 | 1987.000 | 98006.000 | 47.567 | -122.124 | 3080.000 | 8395.000 | 0.000 | 73609.000 |
| 8877 | 20140805T000000 | 288000.000 | 0.000 | 1.500 | 1430.000 | 1650.000 | 3.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1430.000 | 0.000 | 1999.000 | 0.000 | 98125.000 | 47.722 | -122.290 | 1430.000 | 1650.000 | 0.000 | 3080.000 |
| 8892 | 20140728T000000 | 230000.000 | 3.000 | 0.750 | 1040.000 | 15000.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 1040.000 | 0.000 | 1941.000 | 0.000 | 98028.000 | 47.764 | -122.234 | 1410.000 | 19000.000 | 0.000 | 16040.000 |
| 9174 | 20141003T000000 | 273000.000 | 2.000 | 0.500 | 1180.000 | 7750.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 590.000 | 590.000 | 1945.000 | 0.000 | 98155.000 | 47.769 | -122.316 | 1380.000 | 8976.000 | 0.000 | 8930.000 |
| 9383 | 20141212T000000 | 312500.000 | 4.000 | 0.500 | 2300.000 | 5570.000 | 2.000 | 0.000 | 0.000 | 3.000 | 8.000 | 2300.000 | 0.000 | 1996.000 | 0.000 | 98092.000 | 47.328 | -122.168 | 1820.000 | 6371.000 | 0.000 | 7870.000 |
| 9492 | 20150122T000000 | 699999.000 | 3.000 | 0.750 | 1240.000 | 4000.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 1240.000 | 0.000 | 1968.000 | 0.000 | 98112.000 | 47.624 | -122.297 | 1460.000 | 4000.000 | 0.000 | 5240.000 |
| 9537 | 20141022T000000 | 352000.000 | 2.000 | 0.750 | 760.000 | 33801.000 | 1.000 | 0.000 | 0.000 | 4.000 | 4.000 | 760.000 | 0.000 | 1931.000 | 0.000 | 98059.000 | 47.470 | -122.076 | 1100.000 | 39504.000 | 0.000 | 34561.000 |
| 9758 | 20150406T000000 | 250000.000 | 2.000 | 0.750 | 700.000 | 16828.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 700.000 | 0.000 | 1958.000 | 0.000 | 98092.000 | 47.301 | -122.125 | 2010.000 | 29316.000 | 0.000 | 17528.000 |
| 10046 | 20141022T000000 | 200000.000 | 2.000 | 0.750 | 780.000 | 55764.000 | 1.000 | 0.000 | 0.000 | 4.000 | 4.000 | 780.000 | 0.000 | 1945.000 | 0.000 | 98058.000 | 47.442 | -122.105 | 1620.000 | 30847.000 | 0.000 | 56544.000 |
| 10454 | 20150204T000000 | 230000.000 | 2.000 | 0.750 | 890.000 | 19703.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 890.000 | 0.000 | 1934.000 | 0.000 | 98045.000 | 47.492 | -121.783 | 1270.000 | 9800.000 | 0.000 | 20593.000 |
| 10647 | 20150505T000000 | 95000.000 | 1.000 | 0.750 | 760.000 | 5746.000 | 1.000 | 0.000 | 0.000 | 4.000 | 5.000 | 760.000 | 0.000 | 1915.000 | 0.000 | 98002.000 | 47.305 | -122.215 | 970.000 | 6696.000 | 0.000 | 6506.000 |
| 10847 | 20140523T000000 | 80000.000 | 1.000 | 0.750 | 430.000 | 5050.000 | 1.000 | 0.000 | 0.000 | 2.000 | 4.000 | 430.000 | 0.000 | 1912.000 | 0.000 | 98014.000 | 47.650 | -121.909 | 1200.000 | 7500.000 | 0.000 | 5480.000 |
| 11104 | 20140930T000000 | 170000.000 | 3.000 | 0.750 | 1040.000 | 42180.000 | 1.000 | 0.000 | 0.000 | 2.000 | 6.000 | 1040.000 | 0.000 | 1947.000 | 0.000 | 98055.000 | 47.452 | -122.199 | 1270.000 | 24090.000 | 0.000 | 43220.000 |
| 11348 | 20140814T000000 | 330000.000 | 2.000 | 0.750 | 520.000 | 6862.000 | 1.000 | 0.000 | 0.000 | 4.000 | 4.000 | 520.000 | 0.000 | 1924.000 | 1980.000 | 98010.000 | 47.326 | -122.037 | 1170.000 | 8756.000 | 0.000 | 7382.000 |
| 11351 | 20140814T000000 | 255000.000 | 1.000 | 0.500 | 880.000 | 1642.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 500.000 | 380.000 | 1910.000 | 0.000 | 98126.000 | 47.573 | -122.372 | 1410.000 | 2992.000 | 0.000 | 2522.000 |
| 11661 | 20140620T000000 | 245000.000 | 1.000 | 0.750 | 380.000 | 15000.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 380.000 | 0.000 | 1963.000 | 0.000 | 98168.000 | 47.481 | -122.323 | 1170.000 | 15000.000 | 0.000 | 15380.000 |
| 11924 | 20141015T000000 | 325000.000 | 2.000 | 0.750 | 1020.000 | 1076.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1020.000 | 0.000 | 2008.000 | 0.000 | 98144.000 | 47.594 | -122.299 | 1020.000 | 1357.000 | 0.000 | 2096.000 |
| 12368 | 20141027T000000 | 369900.000 | 1.000 | 0.750 | 760.000 | 10079.000 | 1.000 | 1.000 | 4.000 | 5.000 | 5.000 | 760.000 | 0.000 | 1936.000 | 0.000 | 98070.000 | 47.468 | -122.438 | 1230.000 | 14267.000 | 0.000 | 10839.000 |
| 12374 | 20141009T000000 | 290000.000 | 2.000 | 0.750 | 440.000 | 8313.000 | 1.000 | 1.000 | 3.000 | 4.000 | 5.000 | 440.000 | 0.000 | 1943.000 | 0.000 | 98070.000 | 47.434 | -122.512 | 880.000 | 26289.000 | 0.000 | 8753.000 |
| 12395 | 20140820T000000 | 316000.000 | 3.000 | 0.750 | 1270.000 | 10092.000 | 1.000 | 0.000 | 0.000 | 5.000 | 7.000 | 1270.000 | 0.000 | 1971.000 | 0.000 | 98077.000 | 47.757 | -122.073 | 1300.000 | 10375.000 | 0.000 | 11362.000 |
| 12401 | 20150114T000000 | 109000.000 | 2.000 | 0.500 | 580.000 | 6900.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 580.000 | 0.000 | 1941.000 | 0.000 | 98118.000 | 47.514 | -122.262 | 1570.000 | 5040.000 | 0.000 | 7480.000 |
| 12734 | 20140923T000000 | 290000.000 | 1.000 | 0.750 | 740.000 | 1284.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 740.000 | 0.000 | 1928.000 | 0.000 | 98107.000 | 47.674 | -122.406 | 1430.000 | 3988.000 | 0.000 | 2024.000 |
| 13593 | 20140625T000000 | 190000.000 | 1.000 | 0.750 | 930.000 | 29258.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 930.000 | 0.000 | 1941.000 | 0.000 | 98178.000 | 47.484 | -122.236 | 2000.000 | 18321.000 | 0.000 | 30188.000 |
| 13624 | 20141211T000000 | 205000.000 | 3.000 | 0.750 | 1080.000 | 5025.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 1080.000 | 0.000 | 1948.000 | 0.000 | 98146.000 | 47.494 | -122.335 | 1370.000 | 6000.000 | 0.000 | 6105.000 |
| 13745 | 20150424T000000 | 224000.000 | 1.000 | 0.750 | 840.000 | 7203.000 | 1.500 | 0.000 | 0.000 | 3.000 | 6.000 | 840.000 | 0.000 | 1949.000 | 0.000 | 98168.000 | 47.476 | -122.301 | 1560.000 | 8603.000 | 0.000 | 8043.000 |
| 14341 | 20150304T000000 | 230000.000 | 2.000 | 0.750 | 900.000 | 3527.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 900.000 | 0.000 | 1939.000 | 0.000 | 98146.000 | 47.508 | -122.336 | 1220.000 | 4080.000 | 0.000 | 4427.000 |
| 14445 | 20140703T000000 | 100000.000 | 2.000 | 0.750 | 660.000 | 5240.000 | 1.000 | 0.000 | 0.000 | 4.000 | 4.000 | 660.000 | 0.000 | 1912.000 | 0.000 | 98032.000 | 47.388 | -122.234 | 850.000 | 5080.000 | 0.000 | 5900.000 |
| 14550 | 20140709T000000 | 150000.000 | 3.000 | 0.750 | 490.000 | 38500.000 | 1.500 | 0.000 | 0.000 | 4.000 | 5.000 | 490.000 | 0.000 | 1959.000 | 0.000 | 98014.000 | 47.711 | -121.315 | 800.000 | 18297.000 | 0.000 | 38990.000 |
| 14922 | 20141029T000000 | 265000.000 | 0.000 | 0.750 | 384.000 | 213444.000 | 1.000 | 0.000 | 0.000 | 3.000 | 4.000 | 384.000 | 0.000 | 2003.000 | 0.000 | 98070.000 | 47.418 | -122.491 | 1920.000 | 224341.000 | 0.000 | 213828.000 |
| 15556 | 20141218T000000 | 405000.000 | 2.000 | 0.750 | 1160.000 | 15029.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 870.000 | 290.000 | 1937.000 | 0.000 | 98014.000 | 47.693 | -121.870 | 1870.000 | 25346.000 | 0.000 | 16189.000 |
| 15593 | 20150205T000000 | 380000.000 | 0.000 | 0.000 | 1470.000 | 979.000 | 3.000 | 0.000 | 2.000 | 3.000 | 8.000 | 1470.000 | 0.000 | 2006.000 | 0.000 | 98133.000 | 47.715 | -122.356 | 1470.000 | 1399.000 | 0.000 | 2449.000 |
| 16144 | 20150410T000000 | 352000.000 | 3.000 | 0.750 | 1240.000 | 7200.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1240.000 | 0.000 | 1947.000 | 0.000 | 98133.000 | 47.730 | -122.342 | 1210.000 | 7200.000 | 0.000 | 8440.000 |
| 16425 | 20150304T000000 | 413252.000 | 3.000 | 0.750 | 1110.000 | 3960.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1110.000 | 0.000 | 1951.000 | 0.000 | 98117.000 | 47.683 | -122.366 | 1610.000 | 5530.000 | 0.000 | 5070.000 |
| 16780 | 20140916T000000 | 550000.000 | 2.000 | 0.750 | 1040.000 | 4000.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 930.000 | 110.000 | 1909.000 | 0.000 | 98119.000 | 47.649 | -122.372 | 1700.000 | 4800.000 | 0.000 | 5040.000 |
| 17055 | 20141113T000000 | 200000.000 | 1.000 | 0.750 | 680.000 | 9600.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 680.000 | 0.000 | 1947.000 | 0.000 | 98115.000 | 47.696 | -122.306 | 1580.000 | 6624.000 | 0.000 | 10280.000 |
| 17704 | 20150429T000000 | 355000.000 | 0.000 | 0.000 | 2460.000 | 8049.000 | 2.000 | 0.000 | 0.000 | 3.000 | 8.000 | 2460.000 | 0.000 | 1990.000 | 0.000 | 98031.000 | 47.410 | -122.168 | 2520.000 | 8050.000 | 0.000 | 10509.000 |
| 17844 | 20150420T000000 | 310000.000 | 1.000 | 0.750 | 520.000 | 2885.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 520.000 | 0.000 | 1947.000 | 0.000 | 98117.000 | 47.689 | -122.378 | 980.000 | 4241.000 | 0.000 | 3405.000 |
| 17859 | 20141002T000000 | 228000.000 | 0.000 | 1.000 | 390.000 | 5900.000 | 1.000 | 0.000 | 0.000 | 2.000 | 4.000 | 390.000 | 0.000 | 1953.000 | 0.000 | 98118.000 | 47.526 | -122.261 | 2170.000 | 6000.000 | 0.000 | 6290.000 |
| 17946 | 20150217T000000 | 75000.000 | 1.000 | 0.000 | 670.000 | 43377.000 | 1.000 | 0.000 | 0.000 | 3.000 | 3.000 | 670.000 | 0.000 | 1966.000 | 0.000 | 98022.000 | 47.264 | -121.906 | 1160.000 | 42882.000 | 0.000 | 44047.000 |
| 17962 | 20140625T000000 | 562100.000 | 2.000 | 0.750 | 1440.000 | 3700.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1200.000 | 240.000 | 1914.000 | 0.000 | 98107.000 | 47.671 | -122.364 | 1440.000 | 4300.000 | 0.000 | 5140.000 |
| 17974 | 20140612T000000 | 280000.000 | 1.000 | 0.750 | 420.000 | 6720.000 | 1.000 | 0.000 | 0.000 | 3.000 | 5.000 | 420.000 | 0.000 | 1922.000 | 0.000 | 98108.000 | 47.552 | -122.311 | 1420.000 | 6720.000 | 0.000 | 7140.000 |
| 17975 | 20150120T000000 | 190000.000 | 1.000 | 0.750 | 780.000 | 77603.000 | 1.000 | 0.000 | 0.000 | 1.000 | 5.000 | 780.000 | 0.000 | 1945.000 | 0.000 | 98058.000 | 47.440 | -122.104 | 1750.000 | 30847.000 | 0.000 | 78383.000 |
| 18003 | 20141022T000000 | 205000.000 | 3.000 | 0.750 | 770.000 | 7000.000 | 1.000 | 0.000 | 0.000 | 3.000 | 4.000 | 770.000 | 0.000 | 1942.000 | 0.000 | 98024.000 | 47.566 | -121.887 | 950.000 | 10500.000 | 0.000 | 7770.000 |
| 18170 | 20141003T000000 | 124000.000 | 1.000 | 0.750 | 840.000 | 7203.000 | 1.500 | 0.000 | 0.000 | 3.000 | 6.000 | 840.000 | 0.000 | 1949.000 | 0.000 | 98168.000 | 47.476 | -122.301 | 1560.000 | 8603.000 | 0.000 | 8043.000 |
| 18283 | 20140925T000000 | 240000.000 | 0.000 | 2.500 | 1810.000 | 5669.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1810.000 | 0.000 | 2003.000 | 0.000 | 98038.000 | 47.349 | -122.053 | 1810.000 | 5685.000 | 0.000 | 7479.000 |
| 18596 | 20150413T000000 | 139950.000 | 0.000 | 0.000 | 844.000 | 4269.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 844.000 | 0.000 | 1913.000 | 0.000 | 98001.000 | 47.278 | -122.250 | 1380.000 | 9600.000 | 0.000 | 5113.000 |
| 18764 | 20140826T000000 | 204950.000 | 2.000 | 0.750 | 1130.000 | 11429.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1130.000 | 0.000 | 1956.000 | 0.000 | 98188.000 | 47.453 | -122.284 | 1550.000 | 10700.000 | 0.000 | 12559.000 |
| 18771 | 20141230T000000 | 530000.000 | 3.000 | 0.750 | 920.000 | 20412.000 | 1.000 | 1.000 | 2.000 | 5.000 | 6.000 | 920.000 | 0.000 | 1950.000 | 0.000 | 98070.000 | 47.478 | -122.490 | 1162.000 | 54705.000 | 0.000 | 21332.000 |
| 19050 | 20141027T000000 | 355000.000 | 3.000 | 0.750 | 1420.000 | 3060.000 | 1.000 | 0.000 | 0.000 | 4.000 | 7.000 | 860.000 | 560.000 | 1923.000 | 0.000 | 98103.000 | 47.687 | -122.346 | 1350.000 | 4000.000 | 0.000 | 4480.000 |
| 19583 | 20150316T000000 | 363000.000 | 3.000 | 0.750 | 2510.000 | 20000.000 | 2.000 | 0.000 | 0.000 | 4.000 | 7.000 | 2510.000 | 0.000 | 1961.000 | 0.000 | 98001.000 | 47.287 | -122.287 | 2130.000 | 20000.000 | 0.000 | 22510.000 |
| 20346 | 20140807T000000 | 210000.000 | 2.000 | 0.750 | 840.000 | 49658.000 | 1.000 | 0.000 | 0.000 | 2.000 | 6.000 | 840.000 | 0.000 | 1948.000 | 0.000 | 98168.000 | 47.473 | -122.292 | 1240.000 | 11000.000 | 0.000 | 50498.000 |
| 20355 | 20141104T000000 | 280000.000 | 1.000 | 0.000 | 600.000 | 24501.000 | 1.000 | 0.000 | 0.000 | 2.000 | 3.000 | 600.000 | 0.000 | 1950.000 | 0.000 | 98045.000 | 47.532 | -121.749 | 990.000 | 22549.000 | 0.000 | 25101.000 |
| 20598 | 20140714T000000 | 202000.000 | 1.000 | 0.750 | 590.000 | 5650.000 | 1.000 | 0.000 | 0.000 | 3.000 | 6.000 | 590.000 | 0.000 | 1944.000 | 0.000 | 98118.000 | 47.518 | -122.267 | 980.000 | 5650.000 | 0.000 | 6240.000 |
| 20616 | 20141023T000000 | 385000.000 | 3.000 | 0.750 | 1330.000 | 7020.000 | 1.000 | 0.000 | 0.000 | 5.000 | 7.000 | 1330.000 | 0.000 | 1924.000 | 0.000 | 98126.000 | 47.538 | -122.376 | 1410.000 | 5802.000 | 0.000 | 8350.000 |
| 20799 | 20140604T000000 | 299000.000 | 1.000 | 0.750 | 560.000 | 12120.000 | 1.000 | 0.000 | 0.000 | 3.000 | 4.000 | 560.000 | 0.000 | 1967.000 | 0.000 | 98014.000 | 47.675 | -121.854 | 1300.000 | 19207.000 | 0.000 | 12680.000 |
| 20843 | 20150130T000000 | 325000.000 | 1.000 | 0.750 | 410.000 | 8636.000 | 1.000 | 0.000 | 0.000 | 2.000 | 4.000 | 410.000 | 0.000 | 1953.000 | 0.000 | 98146.000 | 47.508 | -122.357 | 1190.000 | 8636.000 | 0.000 | 9046.000 |
| 20920 | 20141016T000000 | 315000.000 | 1.000 | 0.750 | 770.000 | 4600.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 770.000 | 0.000 | 1910.000 | 0.000 | 98126.000 | 47.556 | -122.377 | 1550.000 | 4600.000 | 0.000 | 5370.000 |
| 20957 | 20140612T000000 | 1100000.000 | 0.000 | 0.000 | 3064.000 | 4764.000 | 3.500 | 0.000 | 2.000 | 3.000 | 7.000 | 3064.000 | 0.000 | 1990.000 | 0.000 | 98102.000 | 47.636 | -122.322 | 2360.000 | 4000.000 | 0.000 | 7828.000 |
# Replacing 0 bedroom and bathroom counts with null values
df['room_bed'] = df['room_bed'].replace(0, np.nan)
df['room_bath'] = df['room_bath'].replace(0, np.nan)
df.isna().sum().sort_values(ascending=False) #Reporting the number of obs in each column with np.Nan
living_measure15 166 room_bed 126 room_bath 118 condition 85 ceil 72 total_area 68 sight 57 lot_measure 42 long 34 coast 31 furnished 29 lot_measure15 29 living_measure 17 yr_built 15 ceil_measure 1 basement 1 quality 1 zipcode 0 lat 0 yr_renovated 0 price 0 dayhours 0 dtype: int64
# Creating new vars for month and year house was sold
df['dayhours'] = df['dayhours'].str.replace('T000000', '')
df['dayhours'] = pd.to_datetime(df['dayhours'], format = '%Y%m%d')
df['year_sold'] = df['dayhours'].dt.year
df['month_sold'] = df['dayhours'].dt.month
dayhours = df['dayhours'].copy()
df.drop(columns=['dayhours'], inplace=True) # We no longer need this variable for modeling
Code below looks for the rows with the same number of missing observations and
num_missing = df.isnull().sum(axis = 1) #number of missing values by row
num_miss_values = pd.DataFrame(data = num_missing.value_counts()).reset_index()
num_miss_values.columns = ['# of NaNs', '# of rows']
print('Number of Rows with the same Missing Count')
num_miss_values
Number of Rows with the same Missing Count
| # of NaNs | # of rows | |
|---|---|---|
| 0 | 0 | 21267 |
| 1 | 3 | 99 |
| 2 | 2 | 83 |
| 3 | 1 | 81 |
| 4 | 4 | 71 |
| 5 | 5 | 11 |
| 6 | 9 | 1 |
for n in num_missing.value_counts().index:
if n > 0:
print(f'For the rows with exactly {n} missing values, NAs are found in:')
n_miss_per_col = df[num_missing == n].isnull().sum()
print(n_miss_per_col[n_miss_per_col > 0])
print('\n')
For the rows with exactly 3 missing values, NAs are found in: room_bed 55 room_bath 55 living_measure 17 sight 44 condition 44 living_measure15 82 dtype: int64 For the rows with exactly 2 missing values, NAs are found in: room_bed 7 room_bath 7 lot_measure 41 ceil 71 coast 30 sight 1 condition 1 long 4 total_area 4 dtype: int64 For the rows with exactly 1 missing values, NAs are found in: room_bed 11 room_bath 3 long 30 living_measure15 2 total_area 35 dtype: int64 For the rows with exactly 4 missing values, NAs are found in: room_bed 42 room_bath 42 condition 28 yr_built 14 living_measure15 71 lot_measure15 29 furnished 29 total_area 29 dtype: int64 For the rows with exactly 5 missing values, NAs are found in: room_bed 11 room_bath 11 sight 11 condition 11 living_measure15 11 dtype: int64 For the rows with exactly 9 missing values, NAs are found in: lot_measure 1 ceil 1 coast 1 sight 1 condition 1 quality 1 ceil_measure 1 basement 1 yr_built 1 dtype: int64
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 21613 entries, 0 to 21612 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 price 21613 non-null float64 1 room_bed 21487 non-null float64 2 room_bath 21495 non-null float64 3 living_measure 21596 non-null float64 4 lot_measure 21571 non-null float64 5 ceil 21541 non-null float64 6 coast 21582 non-null float64 7 sight 21556 non-null float64 8 condition 21528 non-null float64 9 quality 21612 non-null float64 10 ceil_measure 21612 non-null float64 11 basement 21612 non-null float64 12 yr_built 21598 non-null float64 13 yr_renovated 21613 non-null float64 14 zipcode 21613 non-null float64 15 lat 21613 non-null float64 16 long 21579 non-null float64 17 living_measure15 21447 non-null float64 18 lot_measure15 21584 non-null float64 19 furnished 21584 non-null float64 20 total_area 21545 non-null float64 21 year_sold 21613 non-null int64 22 month_sold 21613 non-null int64 dtypes: float64(21), int64(2) memory usage: 4.0 MB
There are too many unique zip code variables to get meaningful EDA and my cause overfitting. Grouping zipcodes based on their first digits. But should it be the first two, three, or four? The code below shows the optimum is the first FOUR digits as two and three leave so few groups and five just leaves all of the unique zipcodes.
# Seeing how many unique values exist if we group zipcodes by their first two, three, or four digits
for i in range(2,6):
print(f"{str(i)} has {df.zipcode.astype('str').str[0:i].nunique()} unique values")
2 has 1 unique values 3 has 2 unique values 4 has 19 unique values 5 has 70 unique values
# The first digit indicates one of the regions and second digit indicates the sub region or one of the postal circles (States),
# So using first 2 digits will do work for our model
df['zipcode'] = df['zipcode'].astype(str)
print(df['zipcode'].str[0:4].nunique())
df['zipcode2'] = df['zipcode'].str[0:4]
19
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 21613 entries, 0 to 21612 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 price 21613 non-null float64 1 room_bed 21487 non-null float64 2 room_bath 21495 non-null float64 3 living_measure 21596 non-null float64 4 lot_measure 21571 non-null float64 5 ceil 21541 non-null float64 6 coast 21582 non-null float64 7 sight 21556 non-null float64 8 condition 21528 non-null float64 9 quality 21612 non-null float64 10 ceil_measure 21612 non-null float64 11 basement 21612 non-null float64 12 yr_built 21598 non-null float64 13 yr_renovated 21613 non-null float64 14 zipcode 21613 non-null object 15 lat 21613 non-null float64 16 long 21579 non-null float64 17 living_measure15 21447 non-null float64 18 lot_measure15 21584 non-null float64 19 furnished 21584 non-null float64 20 total_area 21545 non-null float64 21 year_sold 21613 non-null int64 22 month_sold 21613 non-null int64 23 zipcode2 21613 non-null object dtypes: float64(20), int64(2), object(2) memory usage: 4.1+ MB
There are some variable that I intend to use as integers in the model that I am going to bunch with those that I intend to use as categorical dummy values so I can transform them into category data types in a separate df called 'df2' to be used during EDA. The reason being that I am more interested in seeing the mode for things like number of bedroooms than I am in seeing the average for bedrooms.
df2 = df.copy()
cat = ['room_bed','room_bath','yr_built','yr_renovated','ceil','coast','sight','condition',
'zipcode', 'zipcode2', 'quality','furnished', 'month_sold', 'year_sold']
for i in cat:
df2[i] = df2[i].astype('category')
pd.set_option('display.float_format', lambda x: '%.3f' % x)
round(df2.describe(include = np.number),3)
| price | living_measure | lot_measure | ceil_measure | basement | lat | long | living_measure15 | lot_measure15 | total_area | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 21613.000 | 21596.000 | 21571.000 | 21612.000 | 21612.000 | 21613.000 | 21579.000 | 21447.000 | 21584.000 | 21545.000 |
| mean | 540182.159 | 2079.861 | 15104.583 | 1788.367 | 291.523 | 47.560 | -122.214 | 1987.066 | 12766.543 | 17192.042 |
| std | 367362.232 | 918.496 | 41423.619 | 828.103 | 442.581 | 0.139 | 0.141 | 685.520 | 27286.987 | 41628.688 |
| min | 75000.000 | 290.000 | 520.000 | 290.000 | 0.000 | 47.156 | -122.519 | 399.000 | 651.000 | 1423.000 |
| 25% | 321950.000 | 1429.250 | 5040.000 | 1190.000 | 0.000 | 47.471 | -122.328 | 1490.000 | 5100.000 | 7032.000 |
| 50% | 450000.000 | 1910.000 | 7618.000 | 1560.000 | 0.000 | 47.572 | -122.230 | 1840.000 | 7620.000 | 9575.000 |
| 75% | 645000.000 | 2550.000 | 10684.500 | 2210.000 | 560.000 | 47.678 | -122.125 | 2360.000 | 10087.000 | 13000.000 |
| max | 7700000.000 | 13540.000 | 1651359.000 | 9410.000 | 4820.000 | 47.778 | -121.315 | 6210.000 | 871200.000 | 1652659.000 |
# The max for price seems suspect
df2[df2['price']>=7700000.000]
| price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | zipcode | lat | long | living_measure15 | lot_measure15 | furnished | total_area | year_sold | month_sold | zipcode2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1068 | 7700000.000 | 6.000 | 8.000 | 12050.000 | 27600.000 | 2.500 | 0.000 | 3.000 | 4.000 | 13.000 | 8570.000 | 3480.000 | 1910.000 | 1987.000 | 98102.0 | 47.630 | -122.323 | 3940.000 | 8800.000 | 1.000 | 39650.000 | 2014 | 10 | 9810 |
Well the house has 6 bedrooms, 8 bathrooms, condition of 4 and quality of 13. Appears to just be a mansion
pd.set_option('display.float_format', lambda x: '%.3f' % x)
round(df2.describe(include = 'category'),3)
| room_bed | room_bath | ceil | coast | sight | condition | quality | yr_built | yr_renovated | zipcode | furnished | year_sold | month_sold | zipcode2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 21487.000 | 21495.000 | 21541.000 | 21582.000 | 21556.000 | 21528.000 | 21612.000 | 21598.000 | 21613.000 | 21613 | 21584.000 | 21613 | 21613 | 21613 |
| unique | 9.000 | 29.000 | 6.000 | 2.000 | 5.000 | 5.000 | 12.000 | 116.000 | 70.000 | 70 | 2.000 | 2 | 12 | 19 |
| top | 3.000 | 2.500 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 2014.000 | 0.000 | 98103.0 | 0.000 | 2014 | 5 | 9805 |
| freq | 9767.000 | 5358.000 | 10647.000 | 21421.000 | 19437.000 | 13978.000 | 8981.000 | 559.000 | 20699.000 | 602 | 17338.000 | 14633 | 2414 | 2576 |
#Dist plots and box plots to quantitative varaibles
def quantplot(variable, figsize=(10,8), bins=None, xticker=None):
mean=variable.mean() #Creating the mean for each quantitiave variable
median=variable.median() #Creating the median for each quantitiave variable
Q1= variable.quantile(0.25) #1st quartile
Q3= variable.quantile(0.75) #2nd quartile
IQR=Q3-Q1
Lower_Whisker = Q1 - 1.5*IQR
Upper_Whisker = Q3 + 1.5*IQR
b_outliers = df[variable < Lower_Whisker].count()[1]
u_outliers = df[variable > Upper_Whisker].count()[1]
print(f' The mean and median for {variable.name} are {mean} and {median}, respectively.')
print(f' There are {b_outliers} below the lower whisker and {u_outliers} above the upper whisker.')
#Creating box plots for each quant variable that will align on the same x-a,xis as the dist plot
f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, figsize=figsize, gridspec_kw= {"height_ratios": (0.25, 0.75)})
sns.boxplot(variable, color = 'y', ax=ax_box)
ax_box.axvline(mean, color='r', linestyle='-') #Plotting the mean
ax_box.axvline(median, color='purple', linestyle='--') #Plotting the median
# Creating the dist plots for each quant variable
sns.distplot(variable, color = 'y', bins=bins,ax=ax_hist)
#Increase tick marker frequency
if xticker!= None:
ax_hist.xaxis.set_major_locator(ticker.MultipleLocator(xticker))
ax_hist.axvline(mean, color='r', linestyle='-') #Plotting the mean
ax_hist.axvline(median, color='purple', linestyle='--') #Plotting the median
#Creating axis labels
plt.ylabel("", fontsize=20)
plt.xlabel(variable.name, fontsize=20)
ax_box.set(xlabel='') # removing box plot x-axis label for visual simplicity
plt.legend(("median","mean")) #Creating the legend to help the viewer identify the mean and median
plt.tight_layout(h_pad=3)
#Count plots with percentage over bars
def countplot(var, figsize = (10, 8), rotation=0, rotation_perc = 0, hue = None, order = False):
fig = figsize
plt.figure(figsize = fig)
if order is True:
ax = sns.countplot(var, order = var.value_counts().index, hue = hue)
else:
ax = sns.countplot(var, hue = hue)
#else:
#ax = sns.countplot(var, hue = hue, order=order)
ax.set_xticklabels(ax.get_xticklabels(), rotation=rotation, ha="right")
total = len(var) # length of the column
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total) # percentage of each class of the category
x = p.get_x() + p.get_width() / 2 - 0.05 # width of the plot
y = p.get_y() + p.get_height() # hieght of the plot
ax.annotate(percentage, (x, y), size = 14, rotation=rotation_perc) # annotate the percantage
plt.xlabel(var.name, fontsize=20)
plt.show()
quantplot(df2.price)
The mean and median for price are 540182.1587933188 and 450000.0, respectively. There are 0 below the lower whisker and 1155 above the upper whisker.
quantplot(df2.living_measure)
The mean and median for living_measure are 2079.8607612520836 and 1910.0, respectively. There are 0 below the lower whisker and 569 above the upper whisker.
quantplot(df2.lot_measure)
The mean and median for lot_measure are 15104.583283111586 and 7618.0, respectively. There are 0 below the lower whisker and 2405 above the upper whisker.
quantplot(df2.ceil_measure)
The mean and median for ceil_measure are 1788.3665556172498 and 1560.0, respectively. There are 0 below the lower whisker and 609 above the upper whisker.
quantplot(df2.basement)
The mean and median for basement are 291.522533777531 and 0.0, respectively. There are 0 below the lower whisker and 492 above the upper whisker.
quantplot(df2.living_measure15)
The mean and median for living_measure15 are 1987.0655569543526 and 1840.0, respectively. There are 0 below the lower whisker and 539 above the upper whisker.
quantplot(df2.lot_measure15)
The mean and median for lot_measure15 are 12766.543180133433 and 7620.0, respectively. There are 0 below the lower whisker and 2181 above the upper whisker.
quantplot(df2.total_area)
The mean and median for total_area are 17192.041633789744 and 9575.0, respectively. There are 0 below the lower whisker and 2398 above the upper whisker.
countplot(df2.room_bed, order = True)
countplot(df2.room_bath, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.yr_built, order = True, rotation = 60, rotation_perc = 60, figsize = (25, 8))
I feel like data at this granular level is too chaotic. So I am creating a new variable that represents the decade that the house built.
df2['decade_built'] = df2['yr_built'].apply(lambda x: x//10)*10
df2.loc[df2['yr_built'].isnull(), df2.columns.isin([f"decade_built"])] = np.nan
df2['decade_built'] = df2['decade_built'].astype('category')
countplot(df2['decade_built'], order = True, rotation = 60, rotation_perc = 60, figsize = (15, 8))
The 2000s was the largest decade for building homes in this dataset. Makes sense given the housing bubble. Even more so with the 2010's decade that followed the bubble bursting being so far down the list.
The 1950s through 1990s have very similar representations in the data set with the 60's being a slightly better decade than the rest.
countplot(df2.yr_renovated, order = True, rotation = 60, rotation_perc = 60, figsize = (25, 8))
Again, I am going to create another decade variable for renovations
df2['decade_renovated'] = df2.yr_renovated.apply(lambda x: x//10)*10
df2['decade_renovated'] = df2['decade_renovated'].astype('category')
df2.loc[df2['yr_renovated'].isnull(), df2.columns.isin([f"decade_renovated"])] = np.nan
countplot(df2['decade_renovated'], order = True, rotation = 60, rotation_perc = 60, figsize = (15, 8))
countplot(df2.ceil, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.coast, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.sight, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.condition, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.zipcode, order = True, rotation = 60, rotation_perc = 60, figsize = (25, 8))
countplot(df2.zipcode2, order = True, rotation = 60, rotation_perc = 60, figsize = (25, 8))
countplot(df2.quality, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.furnished, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.year_sold, order = True, rotation = 60, rotation_perc = 60)
countplot(df2.month_sold, order = True, rotation = 60, rotation_perc = 60)
continuous_col = list(df2.select_dtypes(exclude=['category']).columns)
plt.subplots(figsize=(15,15))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(df[continuous_col].corr().transpose(),cmap=cmap, fmt=".2f",annot=True)
<AxesSubplot:>
sns.pairplot(df2[continuous_col],diag_kind="kde")
<seaborn.axisgrid.PairGrid at 0x264b71ddac0>
continuous_col.append('zipcode2')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'zipcode2')
continuous_col.remove('zipcode2')
continuous_col.append('condition')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'condition')
continuous_col.remove('condition')
continuous_col.append('furnished')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'furnished')
continuous_col.remove('furnished')
sns.displot(data=df2, x='ceil_measure', hue='furnished', height=8, aspect=1.5)
<seaborn.axisgrid.FacetGrid at 0x26509ed3bb0>
sns.displot(data=df2, x='living_measure', hue='furnished', height=8, aspect=1.5)
<seaborn.axisgrid.FacetGrid at 0x264c7a66eb0>
sns.displot(data=df2, x='living_measure15', hue='furnished', height=8, aspect=1.5)
<seaborn.axisgrid.FacetGrid at 0x264c56c7310>
continuous_col.append('room_bath')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'room_bath')
continuous_col.remove('room_bath')
continuous_col.append('ceil')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'ceil')
continuous_col.remove('ceil')
sns.displot(data=df2, x='ceil_measure', hue='ceil', height=6, aspect=2)
<seaborn.axisgrid.FacetGrid at 0x264c4033640>
sns.displot(data=df2, x='basement', hue='ceil', height=6, aspect=2)
<seaborn.axisgrid.FacetGrid at 0x264ca3b9eb0>
continuous_col.append('coast')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'coast')
continuous_col.remove('coast')
continuous_col.append('year_sold')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'year_sold')
continuous_col.remove('year_sold')
continuous_col.append('month_sold')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'month_sold')
continuous_col.remove('month_sold')
continuous_col.append('decade_built')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'decade_built')
continuous_col.remove('decade_built')
continuous_col.append('quality')
sns.pairplot(df2[continuous_col],diag_kind="kde", hue = 'quality')
continuous_col.remove('quality')
def cramers_V(var1,var2) :
crosstab =np.array(pd.crosstab(var1,var2, rownames=None, colnames=None)) # Cross table building
stat = chi2_contingency(crosstab)[0] # Keeping of the test statistic of the Chi2 test
obs = np.sum(crosstab) # Number of observations
mini = min(crosstab.shape)-1 # Take the minimum value between the columns and the rows of the cross table
return (stat/(obs*mini))
from sklearn import preprocessing
data = df2[cat]
label = preprocessing.LabelEncoder()
data_encoded = pd.DataFrame()
for i in data.columns :
data_encoded[i]=label.fit_transform(data[i])
data_encoded.head()
rows= []
for var1 in data_encoded:
col = []
for var2 in data_encoded :
cramers =cramers_V(data_encoded[var1], data_encoded[var2]) # Cramer's V test
col.append(round(cramers,2)) # Keeping of the rounded value of the Cramer's V
rows.append(col)
cramers_results = np.array(rows)
cramer_v_df = pd.DataFrame(cramers_results, columns = data_encoded.columns, index =data_encoded.columns)
cramer_v_df
| room_bed | room_bath | yr_built | yr_renovated | ceil | coast | sight | condition | zipcode | zipcode2 | quality | furnished | month_sold | year_sold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| room_bed | 1.000 | 0.180 | 0.040 | 0.010 | 0.020 | 0.000 | 0.010 | 0.030 | 0.020 | 0.010 | 0.030 | 0.040 | 0.000 | 0.000 |
| room_bath | 0.180 | 1.000 | 0.030 | 0.010 | 0.080 | 0.010 | 0.020 | 0.050 | 0.010 | 0.010 | 0.100 | 0.140 | 0.000 | 0.000 |
| yr_built | 0.040 | 0.030 | 1.000 | 0.010 | 0.140 | 0.010 | 0.010 | 0.060 | 0.020 | 0.040 | 0.050 | 0.080 | 0.010 | 0.010 |
| yr_renovated | 0.010 | 0.010 | 0.010 | 1.000 | 0.000 | 0.020 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| ceil | 0.020 | 0.080 | 0.140 | 0.000 | 1.000 | 0.220 | 0.000 | 0.030 | 0.060 | 0.030 | 0.050 | 0.080 | 0.000 | 0.000 |
| coast | 0.000 | 0.010 | 0.010 | 0.020 | 0.220 | 1.000 | 0.180 | 0.000 | 0.020 | 0.000 | 0.020 | 0.000 | 0.000 | 0.000 |
| sight | 0.010 | 0.020 | 0.010 | 0.010 | 0.000 | 0.180 | 1.000 | 0.130 | 0.020 | 0.010 | 0.020 | 0.020 | 0.000 | 0.000 |
| condition | 0.030 | 0.050 | 0.060 | 0.000 | 0.030 | 0.000 | 0.130 | 1.000 | 0.020 | 0.010 | 0.020 | 0.010 | 0.000 | 0.000 |
| zipcode | 0.020 | 0.010 | 0.020 | 0.000 | 0.060 | 0.020 | 0.020 | 0.020 | 1.000 | 1.000 | 0.040 | 0.080 | 0.000 | 0.010 |
| zipcode2 | 0.010 | 0.010 | 0.040 | 0.000 | 0.030 | 0.000 | 0.010 | 0.010 | 1.000 | 1.000 | 0.010 | 0.030 | 0.000 | 0.000 |
| quality | 0.030 | 0.100 | 0.050 | 0.000 | 0.050 | 0.020 | 0.020 | 0.020 | 0.040 | 0.010 | 1.000 | 0.500 | 0.000 | 0.000 |
| furnished | 0.040 | 0.140 | 0.080 | 0.000 | 0.080 | 0.000 | 0.020 | 0.010 | 0.080 | 0.030 | 0.500 | 1.000 | 0.000 | 0.000 |
| month_sold | 0.000 | 0.000 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.900 |
| year_sold | 0.000 | 0.000 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.010 | 0.000 | 0.000 | 0.000 | 0.900 | 1.000 |
# Heat map of the table Cramer V table
mask = np.zeros_like(cramer_v_df, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
ax = sns.heatmap(cramer_v_df, mask=mask,vmin=0., vmax=1, square=True)
plt.show()
df2['dayhours'] = dayhours
df2['month_agg'] = df2['dayhours']
count_by_month = pd.DataFrame(df2['month_agg'].value_counts()).reset_index()
count_by_month.columns = ['month_agg','count']
count_by_month['month_agg'] = pd.to_datetime(count_by_month['month_agg'],format = '%Y-%m')
count_by_month['month_agg'] = count_by_month['month_agg'].apply(lambda x: x.strftime('%Y-%m'))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='count', data=count_by_month)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
df2['month_agg'] = df2['month_agg'].apply(lambda x: x.strftime('%Y-%m'))
df2.sort_values(by=['month_agg'], inplace=True)
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='price',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='living_measure',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='lot_measure',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='ceil_measure',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='basement',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='living_measure15',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='lot_measure15',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.figure(figsize = (10,8))
ax = sns.lineplot(x='month_agg',y='total_area',data=df2)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
col = 0
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(y='price',x=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(y='price',x=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(y='price',x=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
Text(0.5, 1.0, 'quality')
#Categorical scatter plot
sns.catplot(y='price', x=cat[col], data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
#Categorical scatter plot
sns.catplot(x='price',y=cat[col],data=df2, height=8, aspect=1)
plt.title(f"{cat[col]}", fontsize=20)
col+=1
None of the continuous variables were normally distributed and this will cause issues for the linear regression model without any log transformations. Living measure, ceil measure, and living measure 15 looked like the best continuous variables to use to predict price. Furnshed, zipcode, quality, condition, ceil, yr_renovated, room_bath, and room_bed appear to be important categories when predicting price. Total_area will need to be dropped since it is perfectly correlated with lot_measure.
I am dropping total_area since it is the sum of living measure and lot measure and has perfect colinearity with lot measure. I am dropping zipcode, yr_built, and yr_renovated since we have alternative variables that are aggregated versions of these to help us reduce the dimensionality of our data and make our model more general.
I transforming all of the categorical variables category data types, but I am transforming sight and room_bed to integer data types because I believe it will be best used as a ordinal variable than in the regression rather than a dummy variable for each value for the variables. However, I have to wait until I impute missing values before I can turn these variables to integers.
# Aggregating month_sold to warm_month_sold
# Used https://www.usclimatedata.com to determine that Nov to Feb are warmer and higher house prices when sold.
df.loc[~df['month_sold'].isin([11,12,1,2]), 'warm_month_sold'] = 1
df['warm_month_sold'].fillna(0, inplace=True)
# Creating a categorical var that represents the avg house price per zipcode2 group binned into 3 groups low, medium, and high.
df_mean = df.groupby(by='zipcode2').mean().round(2)[['price']].reset_index()
for i in df['zipcode2'].unique():
df.loc[df['zipcode2']==i, 'avg_price_zip'] = df_mean.loc[df_mean['zipcode2']==i,'price'].values[0]
df['zip_price_cat'] = pd.cut(df['avg_price_zip'], bins=3, labels=['low_price','medium_price','high_price'])
# Dropping columns that are not needed
df.drop(columns=['total_area','zipcode','avg_price_zip','month_sold'], inplace=True)
# Converting category columns to categorical data type for encoder to create dummy variables
cols_to_cat = ['year_sold','warm_month_sold','furnished','coast', 'ceil','zip_price_cat']
df[cols_to_cat] = df[cols_to_cat].astype('category')
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 21613 entries, 0 to 21612 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 price 21613 non-null float64 1 room_bed 21487 non-null float64 2 room_bath 21495 non-null float64 3 living_measure 21596 non-null float64 4 lot_measure 21571 non-null float64 5 ceil 21541 non-null category 6 coast 21582 non-null category 7 sight 21556 non-null float64 8 condition 21528 non-null float64 9 quality 21612 non-null float64 10 ceil_measure 21612 non-null float64 11 basement 21612 non-null float64 12 yr_built 21598 non-null float64 13 yr_renovated 21613 non-null float64 14 lat 21613 non-null float64 15 long 21579 non-null float64 16 living_measure15 21447 non-null float64 17 lot_measure15 21584 non-null float64 18 furnished 21584 non-null category 19 year_sold 21613 non-null category 20 zipcode2 21613 non-null object 21 warm_month_sold 21613 non-null category 22 zip_price_cat 21613 non-null category dtypes: category(6), float64(16), object(1) memory usage: 3.6+ MB
First, the continuous variables are plotted one more time and then plotted again after they have undergone the log transformation. Prior to log transformation, extreme values are analyzed to detect any values that appear to be unreasonable outliers that need to be treated.
# lets plot histogram of all plots
continuous_vars = ['price','living_measure','lot_measure','ceil_measure','basement','living_measure15',
'lot_measure15']
plt.figure(figsize=(17,75))
for i in range(len(continuous_vars)):
plt.subplot(18,3,i+1)
plt.hist(df[continuous_vars[i]])
#sns.displot(df[all_col[i]], kde=True)
plt.tight_layout()
plt.title(continuous_vars[i],fontsize=25)
plt.show()
# outlier detection using boxplot
plt.figure(figsize=(20,30))
for i, variable in enumerate(continuous_vars):
plt.subplot(3,3,i+1)
sns.boxplot(y=df[variable], color = 'y')
plt.tight_layout()
plt.title(variable,fontsize=30)
plt.show()
# Top 5 price values
top5 = df.loc[df['price'].nlargest(5).index.to_list(),'price'].values # Top 5 price values
df.loc[df['price'].nlargest(5).index.to_list(),:]
| price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | furnished | year_sold | zipcode2 | warm_month_sold | zip_price_cat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1068 | 7700000.000 | 6.000 | 8.000 | 12050.000 | 27600.000 | 2.500 | 0.000 | 3.000 | 4.000 | 13.000 | 8570.000 | 3480.000 | 1910.000 | 1987.000 | 47.630 | -122.323 | 3940.000 | 8800.000 | 1.000 | 2014 | 9810 | 1.000 | high_price |
| 10718 | 7060000.000 | 5.000 | 4.500 | 10040.000 | 37325.000 | 2.000 | 1.000 | 2.000 | 3.000 | 11.000 | 7680.000 | 2360.000 | 1940.000 | 2001.000 | 47.650 | -122.214 | 3930.000 | 25449.000 | 1.000 | 2014 | 9800 | 1.000 | high_price |
| 10639 | 6890000.000 | 6.000 | 7.750 | 9890.000 | 31374.000 | 2.000 | 0.000 | 4.000 | 3.000 | 13.000 | 8860.000 | 1030.000 | 2001.000 | 0.000 | 47.630 | NaN | 4540.000 | 42730.000 | 1.000 | 2014 | 9803 | 1.000 | medium_price |
| 12794 | 5570000.000 | 5.000 | 5.750 | 9200.000 | 35069.000 | 2.000 | 0.000 | 0.000 | 3.000 | 13.000 | 6200.000 | 3000.000 | 2001.000 | 0.000 | 47.629 | -122.233 | 3560.000 | 24345.000 | 1.000 | 2014 | 9803 | 1.000 | medium_price |
| 1031 | 5350000.000 | 5.000 | 5.000 | 8000.000 | 23985.000 | 2.000 | 0.000 | 4.000 | 3.000 | 12.000 | 6720.000 | 1280.000 | 2009.000 | 0.000 | 47.623 | -122.220 | 4600.000 | 21750.000 | 1.000 | 2015 | 9800 | 1.000 | high_price |
# Top 5 living measure values
top5 = df.loc[df['living_measure'].nlargest(5).index.to_list(),'living_measure'].values # Top 5 living measure values
df.loc[df['living_measure'].nlargest(5).index.to_list(),:]
| price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | furnished | year_sold | zipcode2 | warm_month_sold | zip_price_cat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7928 | 2280000.000 | 7.000 | 8.000 | 13540.000 | 307752.000 | 3.000 | 0.000 | 4.000 | 3.000 | 12.000 | 9410.000 | 4130.000 | 1999.000 | 0.000 | 47.667 | -121.986 | 4850.000 | 217800.000 | 1.000 | 2014 | 9805 | 1.000 | medium_price |
| 1068 | 7700000.000 | 6.000 | 8.000 | 12050.000 | 27600.000 | 2.500 | 0.000 | 3.000 | 4.000 | 13.000 | 8570.000 | 3480.000 | 1910.000 | 1987.000 | 47.630 | -122.323 | 3940.000 | 8800.000 | 1.000 | 2014 | 9810 | 1.000 | high_price |
| 10718 | 7060000.000 | 5.000 | 4.500 | 10040.000 | 37325.000 | 2.000 | 1.000 | 2.000 | 3.000 | 11.000 | 7680.000 | 2360.000 | 1940.000 | 2001.000 | 47.650 | -122.214 | 3930.000 | 25449.000 | 1.000 | 2014 | 9800 | 1.000 | high_price |
| 10639 | 6890000.000 | 6.000 | 7.750 | 9890.000 | 31374.000 | 2.000 | 0.000 | 4.000 | 3.000 | 13.000 | 8860.000 | 1030.000 | 2001.000 | 0.000 | 47.630 | NaN | 4540.000 | 42730.000 | 1.000 | 2014 | 9803 | 1.000 | medium_price |
| 1245 | 4670000.000 | 5.000 | 6.750 | 9640.000 | 13068.000 | 1.000 | 1.000 | 4.000 | 3.000 | 12.000 | 4820.000 | 4820.000 | 1983.000 | 2009.000 | 47.557 | -122.210 | 3270.000 | 10454.000 | 1.000 | 2014 | 9804 | 1.000 | high_price |
# Top 5 lot measure values
top5 = df.loc[df['lot_measure'].nlargest(5).index.to_list(),'lot_measure'].values # Top 5 living measure values
df.loc[df['lot_measure'].nlargest(5).index.to_list(),:]
| price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | furnished | year_sold | zipcode2 | warm_month_sold | zip_price_cat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11674 | 700000.000 | 4.000 | 1.000 | 1300.000 | 1651359.000 | 1.000 | 0.000 | 3.000 | 4.000 | 6.000 | 1300.000 | 0.000 | 1920.000 | 0.000 | 47.231 | -122.023 | 2560.000 | 425581.000 | 0.000 | 2015 | 9802 | 1.000 | medium_price |
| 580 | 190000.000 | 2.000 | 1.000 | 710.000 | 1164794.000 | 1.000 | 0.000 | 0.000 | 2.000 | 5.000 | 710.000 | 0.000 | 1915.000 | 0.000 | 47.689 | -121.909 | 1680.000 | 16730.000 | 0.000 | 2015 | 9801 | 1.000 | medium_price |
| 3234 | 542500.000 | 5.000 | 3.250 | 3010.000 | 1074218.000 | 1.500 | 0.000 | 0.000 | 5.000 | 8.000 | 2010.000 | 1000.000 | 1931.000 | 0.000 | 47.456 | -122.004 | 2450.000 | 68825.000 | 0.000 | 2014 | 9802 | 1.000 | medium_price |
| 21402 | 855000.000 | 4.000 | 3.500 | 4030.000 | 1024068.000 | 2.000 | 0.000 | 0.000 | 3.000 | 10.000 | 4030.000 | 0.000 | 2006.000 | 0.000 | 47.462 | -121.744 | 1830.000 | 11700.000 | 1.000 | 2015 | 9804 | 0.000 | high_price |
| 5643 | 998000.000 | 4.000 | 3.250 | 3770.000 | 982998.000 | 2.000 | 0.000 | 0.000 | 3.000 | 10.000 | 3770.000 | 0.000 | 1992.000 | 0.000 | 47.414 | -122.087 | 2290.000 | 37141.000 | 1.000 | 2014 | 9805 | 1.000 | medium_price |
# Top 5 basement values
top5 = df.loc[df['basement'].nlargest(5).index.to_list(),'basement'].values # Top 5 living measure values
df.loc[df['basement'].nlargest(5).index.to_list(),:]
| price | room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | furnished | year_sold | zipcode2 | warm_month_sold | zip_price_cat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1245 | 4670000.000 | 5.000 | 6.750 | 9640.000 | 13068.000 | 1.000 | 1.000 | 4.000 | 3.000 | 12.000 | 4820.000 | 4820.000 | 1983.000 | 2009.000 | 47.557 | -122.210 | 3270.000 | 10454.000 | 1.000 | 2014 | 9804 | 1.000 | high_price |
| 7928 | 2280000.000 | 7.000 | 8.000 | 13540.000 | 307752.000 | 3.000 | 0.000 | 4.000 | 3.000 | 12.000 | 9410.000 | 4130.000 | 1999.000 | 0.000 | 47.667 | -121.986 | 4850.000 | 217800.000 | 1.000 | 2014 | 9805 | 1.000 | medium_price |
| 1017 | 3200000.000 | 4.000 | 3.250 | 7000.000 | 28206.000 | 1.000 | 1.000 | 4.000 | 4.000 | 12.000 | 3500.000 | 3500.000 | 1991.000 | 0.000 | 47.593 | -122.086 | 4913.000 | 14663.000 | 1.000 | 2014 | 9807 | 1.000 | high_price |
| 1068 | 7700000.000 | 6.000 | 8.000 | 12050.000 | 27600.000 | 2.500 | 0.000 | 3.000 | 4.000 | 13.000 | 8570.000 | 3480.000 | 1910.000 | 1987.000 | 47.630 | -122.323 | 3940.000 | 8800.000 | 1.000 | 2014 | 9810 | 1.000 | high_price |
| 2668 | 1900000.000 | 5.000 | 4.250 | 6510.000 | 16471.000 | 2.000 | 0.000 | 3.000 | 4.000 | 11.000 | 3250.000 | 3260.000 | 1980.000 | 0.000 | 47.576 | -122.242 | 4480.000 | 16471.000 | 1.000 | 2014 | 9804 | 1.000 | high_price |
# Lets treat outliers by flooring and capping
def treat_outliers(dataframe,col_list):
'''
treats outliers in a varaible by capping the variables values
col: str, name of the numerical varaible
dataframe: data frame
'''
data = dataframe.copy()
for col in col_list:
Q1=data[col].quantile(0.25) # 25th quantile
Q3=data[col].quantile(0.75) # 75th quantile
IQR=Q3-Q1
Lower_Whisker = Q1 - 1.5*IQR
Upper_Whisker = Q3 + 1.5*IQR
data[col] = np.clip(data[col], Lower_Whisker, Upper_Whisker) # all the values samller than Lower_Whisker will be assigned value of Lower_whisker
# and all the values above upper_whishker will be assigned value of upper_Whisker
return data
def log_transform(dataframe,col_list):
'''
Performs log transformation on selected variables
col: str, name of the numerical varaible
dataframe: data frame
'''
data = dataframe.copy()
for col in col_list:
if min(data[col]) == 0:
data[f'log_{col}'] = np.log(data[col]+1)
else:
data[f'log_{col}'] = np.log(data[col])
return data
As I can rationalize the extreme outliers, I will treat them as values that do represent the process for their respective variables and therefore not cap them. I will only perform log transformation on these variables and check the results
log_con_vars = []
for col in continuous_vars:
log_con_vars.append(f'log_{col}')
df_log = log_transform(df, continuous_vars)
plt.figure(figsize=(17,75))
for i in range(len(log_con_vars)):
plt.subplot(18,3,i+1)
plt.hist(df_log[log_con_vars[i]])
#sns.displot(df[all_col[i]], kde=True)
plt.tight_layout()
plt.title(log_con_vars[i],fontsize=25)
plt.show()
Basements log transformation is still heavily skewed. Given the high tails and variance, I think there is a chance that basement is actually a mix of Gaussians. I will create a binned version of this variable as an alternative to try out in the models.
The rest of the variables are close to being approximately normally distributed after the log transformation.
I am binning basement into three categories: those with no basement, those who have a basement and a size this up to 50 percentile among those who have a basement, and then those who have a basement above larger than then 50 percentile. The categories are labeled as No Basement, Small Basement, and Large Basement.
# Getting the smallest basement value that is not equal to 0
df_log.loc[df_log['basement']!=0,'basement'].nsmallest(3)
6709 10.000 16454 10.000 16314 20.000 Name: basement, dtype: float64
# Creating a new variable Basement Category by binning the basement
df_log["basement_category"] = pd.cut(
x=df_log["basement"],
bins=[0.0, 10.0, 700.0, np.inf],
right=False,
labels=["No Basement", "Small Basement", "Large Basement"])
# Confirming the categories are defined as expected. If basement is 0, basement_category should be No Basement and vice versa
print('When basement==0:\n',df_log.loc[df_log['basement']==0, 'basement_category'].value_counts(),'\n')
print('When basement!=0:\n',df_log.loc[df_log['basement']!=0, 'basement_category'].value_counts())
When basement==0: No Basement 13125 Small Basement 0 Large Basement 0 Name: basement_category, dtype: int64 When basement!=0: Large Basement 4376 Small Basement 4111 No Basement 0 Name: basement_category, dtype: int64
X = df_log.drop(["price", "log_price",'zipcode2'], axis=1)
Y = df_log[["price", "log_price"]]
def encode_cat_vars(x):
x = pd.get_dummies(
x,
columns=x.select_dtypes(include=["object", "category"]).columns.tolist(),
drop_first=True)
return x
null_cat = []
for i in cols_to_cat:
if df_log[i].isnull().any() == True:
null_cat.append(i)
null_cat
['furnished', 'coast', 'ceil']
X
| room_bed | room_bath | living_measure | lot_measure | ceil | coast | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | furnished | year_sold | warm_month_sold | zip_price_cat | log_living_measure | log_lot_measure | log_ceil_measure | log_basement | log_living_measure15 | log_lot_measure15 | basement_category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4.000 | 1.750 | 3050.000 | 9440.000 | 1.000 | 0.000 | 0.000 | 3.000 | 8.000 | 1800.000 | 1250.000 | 1966.000 | 0.000 | 47.723 | -122.183 | 2020.000 | 8660.000 | 0.000 | 2015 | 1.000 | medium_price | 8.023 | 9.153 | 7.496 | 7.132 | 7.611 | 9.066 | Large Basement |
| 1 | 2.000 | 1.000 | 670.000 | 3101.000 | 1.000 | 0.000 | 0.000 | 4.000 | 6.000 | 670.000 | 0.000 | 1948.000 | 0.000 | 47.555 | -122.274 | 1660.000 | 4100.000 | 0.000 | 2015 | 1.000 | high_price | 6.507 | 8.039 | 6.507 | 0.000 | 7.415 | 8.319 | No Basement |
| 2 | 4.000 | 2.750 | 3040.000 | 2415.000 | 2.000 | 1.000 | 4.000 | 3.000 | 8.000 | 3040.000 | 0.000 | 1966.000 | 0.000 | 47.519 | -122.256 | 2620.000 | 2433.000 | 0.000 | 2014 | 1.000 | high_price | 8.020 | 7.789 | 8.020 | 0.000 | 7.871 | 7.797 | No Basement |
| 3 | 3.000 | 2.500 | 1740.000 | 3721.000 | 2.000 | 0.000 | 0.000 | 3.000 | 8.000 | 1740.000 | 0.000 | 2009.000 | 0.000 | 47.336 | -122.213 | 2030.000 | 3794.000 | 0.000 | 2014 | 1.000 | high_price | 7.462 | 8.222 | 7.462 | 0.000 | 7.616 | 8.241 | No Basement |
| 4 | 2.000 | 1.000 | 1120.000 | 4590.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1120.000 | 0.000 | 1924.000 | 0.000 | 47.566 | -122.285 | 1120.000 | 5100.000 | 0.000 | 2015 | 0.000 | high_price | 7.021 | 8.432 | 7.021 | 0.000 | 7.021 | 8.537 | No Basement |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 21608 | 4.000 | 2.500 | 3130.000 | 60467.000 | 2.000 | 0.000 | 0.000 | 3.000 | 9.000 | 3130.000 | 0.000 | 1996.000 | 0.000 | 47.662 | -121.962 | 2780.000 | 44224.000 | 1.000 | 2015 | 1.000 | medium_price | 8.049 | 11.010 | 8.049 | 0.000 | 7.930 | 10.697 | No Basement |
| 21609 | 2.000 | 1.000 | 1030.000 | 4841.000 | 1.000 | 0.000 | 0.000 | 3.000 | 7.000 | 920.000 | 110.000 | 1939.000 | 0.000 | 47.686 | -122.341 | 1530.000 | 4944.000 | 0.000 | 2014 | 1.000 | high_price | 6.937 | 8.485 | 6.824 | 4.710 | 7.333 | 8.506 | Small Basement |
| 21610 | 3.000 | 3.750 | 3710.000 | 34412.000 | 2.000 | 0.000 | 0.000 | 3.000 | 10.000 | 2910.000 | 800.000 | 1978.000 | 0.000 | 47.589 | -122.040 | 2390.000 | 34412.000 | 1.000 | 2014 | 1.000 | high_price | 8.219 | 10.446 | 7.976 | 6.686 | 7.779 | 10.446 | Large Basement |
| 21611 | 4.000 | 2.500 | 1560.000 | 7800.000 | 2.000 | 0.000 | 0.000 | 3.000 | 7.000 | 1560.000 | 0.000 | 1997.000 | 0.000 | 47.514 | -122.316 | 1160.000 | 7800.000 | 0.000 | 2015 | 0.000 | low_price | 7.352 | 8.962 | 7.352 | 0.000 | 7.056 | 8.962 | No Basement |
| 21612 | 4.000 | 2.500 | 1940.000 | 4875.000 | 2.000 | 0.000 | 0.000 | 4.000 | 9.000 | 1940.000 | 0.000 | 1925.000 | 0.000 | 47.643 | -122.304 | 1790.000 | 4875.000 | 1.000 | 2014 | 0.000 | high_price | 7.570 | 8.492 | 7.570 | 0.000 | 7.490 | 8.492 | No Basement |
21613 rows × 28 columns
# Encoding categorical vars
ind_vars_num = encode_cat_vars(X)
for i in df_log.columns[df_log.isnull().any()].to_list():
ind_vars_num.loc[df_log[i].isnull(), ind_vars_num.columns.str.startswith(f"{i}_")] = np.nan
ind_vars_num.head()
| room_bed | room_bath | living_measure | lot_measure | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | log_living_measure | log_lot_measure | log_ceil_measure | log_basement | log_living_measure15 | log_lot_measure15 | ceil_1.5 | ceil_2.0 | ceil_2.5 | ceil_3.0 | ceil_3.5 | coast_1.0 | furnished_1.0 | year_sold_2015 | warm_month_sold_1.0 | zip_price_cat_medium_price | zip_price_cat_high_price | basement_category_Small Basement | basement_category_Large Basement | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4.000 | 1.750 | 3050.000 | 9440.000 | 0.000 | 3.000 | 8.000 | 1800.000 | 1250.000 | 1966.000 | 0.000 | 47.723 | -122.183 | 2020.000 | 8660.000 | 8.023 | 9.153 | 7.496 | 7.132 | 7.611 | 9.066 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1 | 1 | 1 | 0 | 0.000 | 1.000 |
| 1 | 2.000 | 1.000 | 670.000 | 3101.000 | 0.000 | 4.000 | 6.000 | 670.000 | 0.000 | 1948.000 | 0.000 | 47.555 | -122.274 | 1660.000 | 4100.000 | 6.507 | 8.039 | 6.507 | 0.000 | 7.415 | 8.319 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1 | 1 | 0 | 1 | 0.000 | 0.000 |
| 2 | 4.000 | 2.750 | 3040.000 | 2415.000 | 4.000 | 3.000 | 8.000 | 3040.000 | 0.000 | 1966.000 | 0.000 | 47.519 | -122.256 | 2620.000 | 2433.000 | 8.020 | 7.789 | 8.020 | 0.000 | 7.871 | 7.797 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0 | 1 | 0 | 1 | 0.000 | 0.000 |
| 3 | 3.000 | 2.500 | 1740.000 | 3721.000 | 0.000 | 3.000 | 8.000 | 1740.000 | 0.000 | 2009.000 | 0.000 | 47.336 | -122.213 | 2030.000 | 3794.000 | 7.462 | 8.222 | 7.462 | 0.000 | 7.616 | 8.241 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 1 | 0 | 1 | 0.000 | 0.000 |
| 4 | 2.000 | 1.000 | 1120.000 | 4590.000 | 0.000 | 3.000 | 7.000 | 1120.000 | 0.000 | 1924.000 | 0.000 | 47.566 | -122.285 | 1120.000 | 5100.000 | 7.021 | 8.432 | 7.021 | 0.000 | 7.021 | 8.537 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1 | 0 | 0 | 1 | 0.000 | 0.000 |
ind_vars_num.isna().sum()
room_bed 126 room_bath 118 living_measure 17 lot_measure 42 sight 57 condition 85 quality 1 ceil_measure 72 basement 1 yr_built 15 yr_renovated 0 lat 0 long 34 living_measure15 166 lot_measure15 29 log_living_measure 17 log_lot_measure 42 log_ceil_measure 1 log_basement 1 log_living_measure15 166 log_lot_measure15 29 ceil_1.5 72 ceil_2.0 72 ceil_2.5 72 ceil_3.0 72 ceil_3.5 72 coast_1.0 31 furnished_1.0 29 year_sold_2015 0 warm_month_sold_1.0 0 zip_price_cat_medium_price 0 zip_price_cat_high_price 0 basement_category_Small Basement 1 basement_category_Large Basement 1 dtype: int64
Splitting the data into training and testing data prior to imputation to avoid data leak of test data information to train data and biasing our model.
from sklearn.model_selection import train_test_split # Splitting data into train and test data
x_train, x_test, y_train, y_test = train_test_split(ind_vars_num, Y, test_size=0.3, random_state=1)
# defining function for imputing missing values using KNN
def imputation(data):
mm = MinMaxScaler() # for standardizing variables prior to imputation
data2 = data.copy()
sca = mm.fit_transform(data2) # standardizing data
knn_imputer = KNNImputer()
knn = knn_imputer.fit_transform(sca)
data2.iloc[:,:] = mm.inverse_transform(knn) # returning to original data
return data2
x_train_im = imputation(x_train) #imputing missing values for training dataset
x_test_im = imputation(x_test) #imputing missing values for testing dataset
# Rounding floats to integers.
# Rounding to whole numbers except for log transformed variables as the other float vars (bathroom & ceil) are dummy vars now
x_train_im[x_train_im.columns[~pd.Series(x_train_im.columns).str.startswith('log')]] = round(
x_train_im[x_train_im.columns[~pd.Series(x_train_im.columns).str.startswith('log')]])
x_train_im[x_train_im.columns[~pd.Series(x_train_im.columns).str.startswith('log')]] = round(
x_train_im[x_train_im.columns[~pd.Series(x_train_im.columns).str.startswith('log')]])
# Checking training data's unique values to make sure they match expectations after imputation and rounding
columns = x_train_im.columns
for col in columns:
print('Unique Values of {} are \n'.format(col),x_train_im[col].unique())
print('*'*100)
Unique Values of room_bed are [2. 3. 4. 5. 1. 6. 7. 8. 9.] **************************************************************************************************** Unique Values of room_bath are [2. 1. 3. 4. 5. 6. 8. 0. 7.] **************************************************************************************************** Unique Values of living_measure are [ 1830. 1380. 1130. 1820. 1660. 4890. 3860. 2830. 1630. 2120. 1560. 1200. 2960. 2100. 2450. 2160. 1810. 2000. 1940. 3400. 2620. 1260. 3780. 5230. 3800. 390. 1930. 2670. 2340. 750. 3700. 2590. 4060. 2140. 2840. 1240. 2310. 1070. 2110. 1740. 2380. 1520. 2480. 2200. 4040. 1870. 1600. 1760. 1280. 1370. 1970. 1150. 1920. 2790. 2910. 2320. 2580. 2980. 2300. 1010. 2260. 1080. 3790. 3050. 4420. 1270. 1300. 2190. 3510. 2060. 2540. 1950. 5570. 1350. 2350. 3270. 2630. 1610. 1710. 2940. 1190. 2820. 2434. 1290. 3100. 1340. 1050. 3550. 1500. 2410. 2460. 1030. 2330. 1580. 920. 1780. 2170. 3410. 1800. 1020. 840. 1530. 960. 950. 2270. 930. 2090. 1410. 810. 1990. 4740. 2210. 3740. 3460. 1626. 3250. 1960. 990. 1620. 1060. 1320. 1450. 4020. 720. 3110. 2440. 4320. 1584. 1750. 2600. 850. 1420. 1490. 1690. 790. 3990. 1230. 1159. 3720. 1460. 1790. 3361. 1000. 3540. 2020. 2430. 3370. 820. 2990. 2610. 1510. 3180. 3150. 3580. 1640. 1400. 2870. 1850. 1100. 2010. 3640. 3040. 1720. 1180. 2070. 2560. 2040. 2495. 2880. 2690. 2520. 1782. 3300. 3120. 3770. 2050. 2810. 3900. 1980. 1730. 2370. 1250. 4510. 3880. 1160. 3020. 1550. 2230. 3340. 1440. 4170. 2800. 1890. 4100. 3070. 2510. 3220. 3330. 3200. 2710. 1220. 3090. 1360. 1670. 2470. 3290. 1110. 2240. 1040. 1090. 1480. 550. 1590. 1470. 2740. 3470. 4190. 2400. 2780. 5440. 1430. 770. 1770. 740. 3380. 2760. 900. 3620. 2500. 1900. 3130. 2920. 2280. 1880. 2660. 1840. 1330. 2950. 1210. 1680. 3660. 1310. 1170. 2290. 5310. 1650. 6980. 2720. 2650. 2150. 1084. 3136. 940. 2530. 3504. 2850. 2770. 2007. 2220. 2550. 1390. 2730. 4700. 3060. 620. 3560. 2390. 2099. 4240. 860. 1120. 4150. 1540. 2700. 4070. 3230. 7100. 3610. 6930. 3570. 3750. 4030. 660. 3170. 3480. 3030. 980. 5070. 3085. 800. 2680. 5430. 3670. 2250. 1656. 3360. 1384. 2750. 2890. 2490. 3000. 4830. 2860. 4850. 690. 3980. 3680. 1700. 760. 3630. 3010. 3140. 3960. 5040. 3440. 3710. 4620. 3870. 2180. 1570. 1812. 890. 1140. 520. 2570. 970. 3600. 3390. 2030. 780. 3280. 3240. 2420. 3190. 880. 2732. 3080. 1910. 4230. 2286. 1860. 3260. 3320. 870. 4460. 1714. 2235. 4560. 910. 1494. 4250. 2298. 4010. 5860. 830. 1008. 2692. 3520. 3830. 3490. 4500. 1405. 3160. 3920. 4210. 2130. 2963. 2601. 7400. 560. 2360. 2906. 3890. 3760. 2675. 3530. 4050. 4930. 3450. 5020. 3236. 3350. 5620. 4380. 580. 828. 4960. 630. 700. 2930. 2701. 4080. 3500. 4270. 3840. 3906. 460. 290. 3223. 893. 4910. 1611. 4590. 2080. 1728. 2640. 1952. 3430. 2864. 3820. 5850. 1954. 5660. 2424. 4780. 1794. 3730. 710. 2068. 3590. 2753. 4490. 6300. 5160. 2519. 4370. 13540. 3650. 2656. 3810. 5030. 3420. 6160. 3052. 6330. 4110. 3931. 4750. 4200. 7120. 2303. 1984. 3273. 2900. 4870. 4280. 4430. 4800. 1495. 1413. 4720. 3310. 2242. 3950. 5010. 1608. 10040. 1651. 2643. 1987. 5220. 2876. 1481. 6200. 6390. 3672. 5410. 3065. 730. 4290. 2970. 530. 2605. 4440. 2544. 3690. 1921. 1578. 650. 1445. 2568. 5480. 4310. 1964. 670. 1676. 3545. 4470. 2154. 4300. 4140. 4340. 2612. 2425. 4350. 3569. 2708. 5370. 1444. 1814. 4860. 1358. 2796. 5360. 4390. 4090. 5540. 982. 1995. 3940. 3930. 1894. 6050. 4220. 2414. 2305. 2623. 2689. 2803. 4130. 2064. 1278. 3695. 1805. 2815. 1522. 600. 2598. 3850. 3362. 4610. 1296. 4410. 2413. 4120. 680. 2115. 2678. 1852. 2891. 2483. 2993. 2885. 490. 1572. 2849. 4640. 2714. 1981. 6500. 640. 5240. 4180. 1556. 6880. 5330. 2531. 7880. 3210. 4570. 1726. 998. 2658. 3001. 5190. 1252. 4660. 1677. 4000. 4450. 2506. 1425. 1212. 4600. 1628. 3266. 6563. 5550. 3216. 2301. 2092. 2134. 4580. 4330. 1233. 8670. 1833. 2311. 3910. 2518. 2927. 2238. 5470. 4260. 5170. 5050. 2038. 1741. 1798. 4980. 2168. 4670. 2326. 2655. 4530. 4360. 3028. 5300. 3902. 5180. 1092. 1975. 1936. 2798. 5150. 3970. 2665. 2475. 5280. 1606. 5400. 4650. 2502. 2575. 2718. 2577. 1899. 4790. 3845. 380. 2025. 1061. 1108. 3078. 2672. 3148. 5290. 1095. 5610. 3284. 1068. 1765. 4285. 1484. 1643. 2166. 4970. 1678. 2217. 5700. 5270. 1808. 6040. 2095. 3831. 1255. 2641. 833. 3118. 3087. 1264. 4065. 6070. 5990. 1747. 2145. 4160. 2253. 5930. 4630. 1129. 2449. 2578. 2233. 5770. 590. 5000. 2793. 3488. 7050. 7270. 570. 5450. 6110. 5830. 1453. 7440. 1979. 2789. 2075. 1088. 1763. 1914. 6260. 1392. 1904. 2181. 5600. 6370. 2192. 3828. 2163. 2208. 1552. 2044. 2669. 2497. 4690. 12050. 2738. 1451. 2009. 4770. 3526. 4940. 2019. 2195. 5130. 5520. 1646. 3366. 420. 5820. 1347. 1785. 3655. 844. 2153. 2846. 4133. 4680. 2014. 4480. 1912. 2441. 2398. 1315. 3444. 4810. 2628. 3274. 2251. 9640. 3064. 1396. 5780. 1992. 2074. 480. 1861. 1889. 1164. 2382. 2031. 7730. 1613. 3732. 5510. 1679. 2517. 6840. 2653. 3276. 440. 3135. 1333. 2283. 2105. 1654. 1996. 2395. 2198. 1144. 3192. 1639. 2437. 901. 1496. 5080. 3753. 1498. 1397. 1275. 1502. 2206. 1295. 4550. 5403. 3181. 2496. 5940. 1435. 1352. 6400. 5420. 2961. 1489. 3133. 2156. 9200. 1175. 2744. 2093. 5635. 1778. 2267. 7420. 6630. 2056. 4730. 1834. 5545. 5790. 2336. 1256. 5350. 6550. 1532. 2201. 5120. 3674. 2783. 6510. 2811. 2085. 3527. 2717. 5730. 2547. 7320. 2416. 2331. 4400. 1847. 2375. 1757. 1427. 2979. 2344. 2478. 3281. 2155. 5250. 6430. 5060. 4073. 2406. 3316. 2415. 1876. 5810. 1365. 962. 6085. 9890. 2341. 1072. 2632. 2529. 7480. 1463. 1946. 4495. 902. 2473. 2448. 3206. 5584. 2423. 1322. 5840. 5110. 1509. 1845. 2329. 5530. 1615. 1381. 988. 610. 1961. 1048. 2905. 2034. 384. 3691. 3172. 8000.] **************************************************************************************************** Unique Values of lot_measure are [ 2856. 5820. 6908. ... 7545. 9629. 10866.] **************************************************************************************************** Unique Values of sight are [0. 1. 3. 2. 4.] **************************************************************************************************** Unique Values of condition are [3. 4. 2. 5. 1.] **************************************************************************************************** Unique Values of quality are [ 7. 6. 8. 13. 9. 10. 11. 4. 5. 12. 1. 3.] **************************************************************************************************** Unique Values of ceil_measure are [1830. 1380. 1130. 1220. 980. 4890. 2870. 2830. 1630. 1060. 1260. 1000. 2160. 1770. 1440. 1290. 3400. 2620. 3780. 4450. 3800. 390. 1480. 2020. 1620. 750. 1850. 2590. 4060. 2140. 2840. 1240. 2310. 1070. 1600. 1740. 1760. 1520. 2480. 1420. 1250. 1090. 880. 1280. 1370. 950. 1080. 2790. 2910. 2090. 1540. 1100. 1430. 1010. 2260. 3790. 3050. 4420. 1270. 1300. 1170. 3510. 1160. 2010. 1350. 3860. 2110. 2350. 1640. 1200. 1610. 1710. 2220. 1050. 1190. 2230. 2434. 1660. 3100. 1340. 2320. 1500. 1030. 1730. 1590. 920. 2450. 2170. 2190. 1800. 1020. 840. 1530. 960. 1820. 930. 1580. 1410. 810. 1990. 4740. 1550. 2330. 1780. 1150. 1419. 2940. 1960. 2000. 760. 1450. 4020. 720. 3020. 1940. 1320. 3190. 1584. 1750. 2600. 850. 1210. 1490. 990. 790. 2710. 1230. 1159. 3720. 1790. 3361. 1970. 1560. 2430. 3200. 820. 1510. 2300. 3180. 3150. 3580. 1400. 1650. 1570. 3640. 3040. 1720. 1180. 1920. 2040. 1870. 2495. 2880. 2690. 2610. 1782. 2060. 1980. 1690. 2120. 2440. 2050. 1900. 3900. 1470. 3270. 3880. 2460. 2580. 4170. 2800. 2990. 2500. 2100. 2510. 3220. 3330. 1910. 3090. 970. 1360. 2380. 2340. 1140. 3290. 1040. 3550. 1890. 550. 2740. 3470. 2400. 5440. 770. 740. 900. 2730. 2470. 1460. 2890. 2520. 2280. 1110. 1686. 1120. 1330. 1810. 2950. 1930. 3660. 1310. 3130. 3650. 5330. 2270. 2240. 2150. 1084. 3136. 2630. 940. 2530. 3504. 1680. 2770. 2007. 2550. 1138. 3910. 3060. 620. 3560. 2099. 860. 4240. 1116. 4150. 2700. 3460. 4070. 5240. 3610. 4310. 3570. 3750. 1880. 4030. 660. 2210. 3170. 3480. 3030. 1424. 5070. 680. 1498. 2680. 780. 2660. 4010. 3670. 2250. 1840. 1656. 3360. 2370. 1670. 1144. 2750. 4830. 1950. 4320. 3850. 690. 3980. 3680. 910. 3110. 2180. 1860. 3500. 1266. 2570. 3960. 5040. 3440. 2130. 4620. 2760. 2560. 3770. 1700. 1812. 890. 3410. 2390. 1390. 520. 3540. 2030. 2070. 3280. 3010. 2420. 2650. 2980. 4190. 3210. 3620. 2286. 2780. 2920. 2670. 2290. 3260. 800. 3320. 870. 1714. 2540. 1494. 830. 2298. 2850. 3870. 4910. 3630. 4040. 1008. 2692. 2200. 3530. 2860. 3340. 3080. 3120. 1165. 4210. 3310. 2963. 2601. 6290. 560. 2360. 2906. 2900. 2960. 2675. 3930. 5020. 3236. 670. 4700. 4380. 3970. 580. 828. 4230. 3710. 2720. 3230. 3760. 3420. 630. 700. 2930. 2701. 4080. 3160. 4270. 3906. 460. 290. 3223. 893. 1611. 3300. 1728. 2640. 1652. 3430. 3370. 3690. 2864. 4670. 1954. 2410. 4100. 2424. 2810. 4780. 1794. 710. 2068. 3590. 2165. 3390. 4360. 914. 5160. 2519. 9410. 2490. 2656. 3250. 3052. 4900. 4750. 3490. 2820. 4200. 5480. 2303. 1984. 2671. 2080. 490. 1405. 1413. 3520. 2242. 4000. 1608. 7680. 1651. 2643. 1987. 3450. 730. 1481. 3240. 4440. 4560. 3006. 5050. 3065. 3730. 4290. 530. 590. 3140. 1355. 3740. 2544. 1921. 1578. 650. 4930. 2568. 3380. 1697. 1964. 1676. 2154. 4340. 2612. 2425. 3569. 2708. 4280. 5230. 5370. 1427. 3070. 1444. 944. 1358. 2796. 4860. 3950. 806. 3820. 3700. 3000. 2414. 600. 2305. 3920. 2623. 2689. 2803. 4130. 2064. 1002. 3695. 3810. 3350. 1805. 2815. 1248. 2598. 3362. 4610. 1296. 2413. 4120. 2115. 2678. 1852. 2618. 2483. 2233. 4430. 3600. 2885. 765. 1572. 3990. 2849. 2714. 1981. 5180. 570. 4140. 640. 1556. 2531. 7880. 3830. 4570. 1726. 4250. 798. 2658. 3001. 5190. 992. 4660. 1677. 1274. 2506. 5310. 1425. 1212. 1628. 3266. 5153. 4850. 3216. 2301. 2092. 2134. 4580. 3840. 963. 6120. 1833. 2311. 2518. 2927. 2238. 4590. 1098. 3890. 4260. 2038. 1774. 4350. 1446. 2970. 1798. 4980. 2168. 4090. 1741. 4110. 2655. 3028. 2782. 1092. 1975. 4640. 1936. 2876. 1256. 2665. 2475. 1606. 1382. 5400. 2502. 2575. 5010. 2718. 2577. 4490. 1899. 4790. 3845. 380. 4410. 2025. 1061. 1108. 3078. 3148. 4540. 1095. 3284. 1068. 1454. 1765. 3485. 1484. 1643. 2166. 1678. 2217. 4460. 1808. 1295. 3831. 1255. 2641. 1174. 833. 3118. 4330. 3087. 1094. 1264. 4065. 6070. 5990. 1747. 4050. 2145. 1384. 4180. 2253. 1272. 1129. 2578. 5770. 5000. 4160. 978. 2793. 3488. 1898. 480. 6420. 5450. 1432. 6110. 5830. 5220. 1453. 5550. 1329. 2789. 1995. 4220. 4500. 2075. 1088. 1763. 1914. 4840. 1392. 1904. 2181. 2798. 6370. 3828. 1012. 3940. 882. 610. 3213. 2163. 2208. 1552. 2044. 2669. 2497. 8570. 2738. 1451. 2009. 3526. 2019. 5130. 1646. 2966. 1326. 4300. 5430. 420. 1347. 1595. 2174. 844. 2176. 2153. 1976. 4133. 1447. 4390. 1858. 2014. 4480. 1912. 2441. 2398. 1315. 4600. 995. 4810. 2628. 3274. 4820. 3064. 1396. 1288. 1278. 2074. 1889. 2382. 2031. 2606. 6660. 1613. 2932. 1679. 2517. 2653. 3276. 440. 3135. 1333. 2283. 1044. 2105. 2686. 1654. 1122. 1996. 2395. 2198. 3192. 1479. 4470. 2437. 901. 5080. 3336. 1397. 1275. 2302. 1502. 4510. 866. 1105. 5403. 3181. 2496. 4950. 1435. 1352. 4285. 2961. 1489. 2192. 1192. 2533. 2156. 6200. 1175. 2744. 2093. 1778. 2267. 7420. 1048. 1834. 4370. 3605. 2201. 3674. 2783. 2811. 3527. 2717. 2547. 2095. 7320. 2331. 4400. 1847. 2375. 1757. 4630. 2979. 2478. 3281. 2155. 5250. 6430. 5060. 6050. 1564. 4073. 2406. 3316. 2415. 1876. 2052. 962. 6085. 2732. 8860. 2341. 1072. 2632. 2529. 1463. 1946. 902. 2473. 5140. 2448. 3206. 5584. 2423. 1322. 2056. 1509. 1845. 2329. 4728. 1282. 5530. 1615. 1381. 3076. 988. 1961. 2905. 384. 3691. 3172. 6720.] **************************************************************************************************** Unique Values of basement are [ 0. 600. 680. 990. 1060. 300. 200. 800. 1040. 370. 1000. 650. 780. 450. 720. 1850. 770. 510. 620. 2020. 880. 550. 840. 700. 490. 290. 1440. 1010. 870. 1020. 900. 530. 1710. 1160. 660. 590. 1290. 1100. 740. 360. 1220. 890. 1410. 1680. 207. 790. 310. 110. 1030. 220. 90. 760. 1120. 1130. 1280. 1570. 170. 980. 1090. 630. 440. 1140. 810. 950. 400. 480. 970. 320. 960. 1240. 260. 940. 1190. 910. 520. 710. 1470. 430. 930. 1600. 730. 1330. 150. 250. 470. 580. 420. 1050. 1200. 80. 1690. 540. 830. 1500. 1380. 240. 1660. 1650. 410. 120. 1170. 640. 670. 330. 1610. 500. 1070. 414. 1400. 1480. 100. 1860. 2620. 1740. 1525. 280. 1420. 1080. 1210. 460. 1350. 850. 1830. 570. 390. 1580. 1110. 1150. 270. 1390. 340. 2160. 210. 380. 180. 140. 1620. 70. 610. 560. 1300. 862. 920. 750. 1910. 1245. 1530. 1800. 820. 1230. 1320. 1260. 1450. 265. 225. 190. 160. 1250. 1430. 1180. 860. 1510. 1550. 1340. 2720. 350. 50. 130. 1730. 690. 2350. 1310. 1270. 2170. 1370. 1560. 588. 1940. 1720. 4130. 2070. 1780. 2120. 1820. 1281. 1360. 1640. 602. 1670. 230. 2100. 1750. 1490. 1870. 2360. 40. 906. 1760. 666. 1460. 145. 1890. 2090. 935. 1540. 243. 1590. 176. 235. 374. 1900. 276. 60. 274. 2060. 1700. 2810. 1520. 1810. 2200. 2550. 356. 295. 10. 1790. 2220. 652. 2850. 266. 2330. 2150. 2010. 1880. 2730. 2300. 1960. 3480. 1135. 1950. 1481. 515. 2240. 435. 861. 4820. 704. 283. 1840. 2490. 1630. 2110. 248. 417. 2250. 2590. 3000. 1275. 1990. 1008. 3260. 475. 1770. 784. 2400. 915. 1248. 516. 2610. 875.] **************************************************************************************************** Unique Values of yr_built are [2005. 1918. 1945. 1977. 1962. 2004. 1991. 1966. 1927. 1963. 1959. 1961. 1951. 1981. 2009. 1954. 1960. 1986. 2014. 1969. 1953. 1975. 1941. 1914. 1950. 1976. 2002. 1983. 1997. 1902. 1957. 1949. 1979. 2000. 2010. 1916. 1972. 1974. 1946. 1926. 1929. 1948. 1987. 1913. 1956. 1982. 1965. 2007. 2011. 1947. 1985. 2013. 1958. 1964. 1973. 1952. 1968. 1990. 1980. 1984. 1931. 1967. 1999. 2006. 1955. 1900. 1910. 2008. 1919. 2001. 1992. 1922. 1928. 1937. 1932. 1938. 1978. 1924. 1930. 1971. 1993. 1943. 1998. 1944. 2003. 2012. 1906. 1942. 1936. 1989. 1995. 1907. 1994. 1912. 1988. 1921. 1925. 1996. 1911. 1923. 1903. 1920. 1970. 1908. 1904. 1939. 1915. 1935. 1905. 1933. 1909. 1917. 2015. 1940. 1901. 1934.] **************************************************************************************************** Unique Values of yr_renovated are [ 0. 1976. 2014. 2009. 2001. 1997. 2010. 2005. 2013. 2006. 1989. 2000. 2004. 1969. 1996. 1954. 1990. 2008. 1995. 1978. 2002. 1991. 1999. 2011. 1992. 2003. 1974. 1979. 1994. 1972. 2015. 1973. 2012. 1986. 1998. 2007. 1953. 1993. 1987. 1963. 1984. 1983. 1988. 1960. 1981. 1957. 1955. 1977. 1968. 1985. 1958. 1967. 1965. 1970. 1962. 1982. 1980. 1948. 1971. 1975. 1946. 1950. 1964. 1945. 1959. 1951. 1940. 1956.] **************************************************************************************************** Unique Values of lat are [48. 47.] **************************************************************************************************** Unique Values of long are [-122. -123. -121.] **************************************************************************************************** Unique Values of living_measure15 are [1850. 1540. 1150. 2050. 1770. 5790. 2940. 1480. 1410. 1830. 1560. 1640. 1680. 1620. 1990. 1440. 1810. 2510. 900. 1800. 3450. 2310. 3940. 2170. 2320. 1820. 1460. 3080. 1960. 4020. 2580. 1950. 1260. 2760. 1554. 1940. 1520. 1980. 1300. 2990. 2000. 1760. 2030. 1790. 2650. 2770. 1740. 1660. 1370. 2020. 2070. 1128. 2240. 1120. 2740. 1910. 3510. 1780. 1590. 2630. 3240. 2150. 3170. 2230. 2590. 3020. 1880. 1920. 1330. 1610. 2120. 1710. 1470. 2434. 2260. 1080. 3480. 4100. 1360. 2040. 2420. 1970. 1210. 2690. 1900. 2550. 1420. 2250. 960. 1690. 2180. 1630. 1450. 1580. 1500. 4800. 2370. 2680. 3200. 1400. 2340. 2560. 1030. 1240. 1310. 1340. 1140. 2405. 1040. 2530. 1350. 4010. 1584. 2060. 1280. 1600. 1430. 1200. 3030. 1020. 3112. 1490. 2200. 1860. 2390. 1730. 1130. 2570. 1530. 2730. 3000. 3090. 1390. 2140. 3370. 1720. 1250. 2500. 1230. 2460. 2300. 1550. 2490. 2880. 1642. 2350. 2360. 3330. 1510. 2270. 3820. 1870. 1650. 1270. 2010. 3280. 1220. 2450. 2090. 3420. 4670. 2960. 2840. 1890. 3770. 2660. 3130. 3220. 3150. 2980. 2790. 1938. 1170. 1070. 1670. 1100. 1700. 4300. 3590. 1684. 4830. 740. 3380. 1175. 2470. 3140. 1060. 2430. 2080. 2540. 2970. 2280. 2410. 2800. 2400. 1840. 1570. 1110. 2920. 2190. 3930. 1734. 2160. 1380. 920. 1084. 2210. 2670. 2189. 2900. 2110. 1320. 3810. 3600. 3060. 620. 3840. 2099. 3010. 3040. 3210. 990. 890. 1634. 3570. 1750. 2330. 2860. 3530. 4210. 4920. 910. 2820. 2780. 1190. 2720. 4340. 3120. 3360. 860. 1930. 1394. 2870. 2700. 2750. 2910. 2100. 4410. 3350. 3980. 2440. 3100. 830. 1768. 2810. 3290. 3050. 2950. 2253. 2610. 1180. 3440. 3674. 4620. 2290. 2380. 3410. 2318. 3430. 2130. 1160. 1568. 1090. 3730. 3960. 3550. 3300. 3890. 1766. 1154. 2890. 980. 3680. 3250. 2083. 3160. 2930. 3070. 1714. 1050. 950. 2620. 2710. 4060. 1494. 2520. 2480. 1010. 4940. 4030. 3320. 4510. 1132. 2574. 3190. 3490. 1405. 2354. 1576. 780. 2236. 3400. 1415. 3950. 3110. 760. 2665. 6110. 2108. 1664. 750. 820. 4240. 3660. 2220. 4930. 3236. 880. 2011. 1862. 3920. 3230. 3630. 4190. 3560. 828. 1772. 4090. 5200. 970. 1290. 3038. 3800. 2566. 1648. 3760. 3500. 2673. 2640. 1979. 2095. 850. 3540. 3310. 3610. 4150. 3340. 2521. 4780. 2419. 690. 3620. 2381. 4040. 3557. 710. 1668. 3650. 3260. 4760. 4850. 1282. 3390. 2600. 3640. 1399. 2518. 4050. 3460. 3710. 2767. 3468. 2808. 2850. 1448. 2586. 3740. 1188. 3270. 2996. 1445. 2830. 3180. 1495. 3880. 2009. 3670. 1138. 4080. 1386. 2406. 1481. 2728. 3750. 1639. 3860. 4390. 2358. 3520. 1614. 1921. 1092. 1264. 840. 4630. 2077. 1824. 2688. 2547. 4660. 3720. 1786. 2027. 2262. 2451. 1757. 2305. 2725. 1908. 2424. 1548. 2112. 2156. 2502. 1000. 4110. 1358. 3580. 2242. 1724. 940. 870. 1834. 1326. 1458. 4170. 1624. 2004. 4610. 3159. 3690. 3138. 810. 4130. 2096. 4160. 2798. 1522. 3715. 2014. 2166. 2608. 2075. 2018. 800. 1608. 1552. 1425. 1981. 2304. 2583. 2303. 3335. 2554. 3990. 4180. 4690. 4590. 2604. 2748. 5600. 4490. 1569. 2114. 4650. 998. 2578. 3624. 1677. 790. 2144. 4220. 4000. 1366. 4042. 1285. 1256. 3087. 4420. 1886. 720. 3850. 2815. 2092. 4140. 1466. 2615. 2458. 3470. 2967. 2456. 1388. 1732. 4570. 2634. 1424. 4400. 2168. 4290. 2002. 3830. 4230. 2879. 930. 1975. 1352. 1274. 2122. 4443. 399. 2475. 1516. 2412. 1798. 3625. 2597. 2575. 1336. 2667. 4470. 1412. 1228. 2384. 1357. 3970. 2641. 4950. 1167. 4560. 2555. 1078. 1765. 1854. 1484. 1546. 2474. 1454. 2961. 3700. 4310. 1307. 5110. 2927. 1572. 4680. 4120. 1439. 1359. 3870. 4530. 2647. 1894. 1544. 1509. 2214. 2793. 5070. 4770. 460. 2221. 5340. 1623. 4740. 2396. 2326. 5220. 1098. 1763. 1948. 1792. 2333. 1303. 1728. 2697. 4730. 3191. 1746. 1665. 3910. 1302. 3900. 1028. 3780. 2669. 2109. 1525. 2963. 2738. 2336. 3639. 1304. 4900. 1855. 1943. 1774. 1056. 2605. 4320. 1398. 2612. 3836. 1574. 2724. 1696. 1814. 3094. 3618. 2015. 2848. 4600. 2296. 1488. 2849. 1745. 3790. 4250. 2642. 4540. 1346. 1982. 2382. 2516. 2876. 1427. 2653. 1354. 3172. 1848. 3008. 2213. 4440. 1492. 2315. 2028. 4200. 2415. 2125. 1654. 4750. 1958. 1882. 2198. 2437. 2765. 3494. 1377. 1295. 1502. 700. 2056. 2496. 1472. 1537. 2533. 1914. 1708. 3726. 5610. 1666. 1309. 2091. 1566. 4480. 2517. 2316. 2598. 1616. 2488. 2441. 1847. 1216. 1443. 1805. 2344. 2394. 2279. 2155. 3543. 2426. 1406. 2873. 4280. 1767. 1876. 3426. 2822. 4070. 1916. 4495. 2037. 2052. 1698. 1442. 1888. 2527. 1918. 5030. 2495. 2998. 1678. 1638. 2065. 2197. 1268. 2142. 1934. 1364. 2175. 1381. 2146. 2256. 2066. 1961. 1156. 3425. 2704.] **************************************************************************************************** Unique Values of lot_measure15 are [ 2667. 4076. 6908. ... 7137. 47044. 8339.] **************************************************************************************************** Unique Values of log_living_measure are [7.51207125 7.22983878 7.02997291 7.50659178 7.41457288 8.49494758 8.25842246 7.94803199 7.39633529 7.65917137 7.3524411 7.09007684 7.99294455 7.64969262 7.8038433 7.6778635 7.50108212 7.60090246 7.57044325 8.13153071 7.8709296 7.138867 8.23747929 8.56216656 8.24275635 5.96614674 7.56527528 7.88983375 7.75790621 6.62007321 8.2160881 7.85941315 8.30893825 7.66856111 7.95155933 7.12286666 7.7450028 6.97541393 7.65444323 7.46164039 7.77485577 7.32646561 7.81601384 7.69621264 8.30399997 7.53369371 7.37775891 7.47306909 7.15461536 7.22256602 7.58578882 7.04751722 7.56008047 7.93379687 7.97590836 7.74932246 7.85554468 7.99967858 7.7406644 6.91770561 7.72312009 6.98471632 8.2401213 8.02289687 8.39389498 7.14677218 7.17011954 7.69165682 8.16337132 7.63046126 7.83991936 7.57558465 8.62515033 7.20785987 7.76217061 8.09254526 7.87473913 7.38398946 7.44424865 7.98616486 7.08170859 7.94449216 7.79729127 7.1623975 8.03915739 7.20042489 6.95654544 8.17470288 7.31322039 7.78738203 7.80791663 6.93731408 7.75362355 7.36518013 6.82437367 7.48436864 7.68248245 8.13446757 7.49554194 6.92755791 6.73340189 7.33302301 6.86693328 6.85646198 7.72753511 6.83518459 7.64491934 7.25134498 6.69703425 7.59588992 8.46379241 7.70074779 8.22684089 8.14902387 7.39387829 8.08641028 7.58069975 6.89770494 7.39018143 6.96602419 7.18538702 7.27931884 8.29903718 6.57925121 8.04237801 7.79975332 8.37101068 7.36770857 7.46737107 7.86326672 6.74523635 7.25841215 7.3065314 7.43248381 6.67203295 8.29154651 7.11476945 7.05531284 8.22147895 7.28619171 7.4899709 8.11999383 6.90775528 8.17188201 7.61085279 7.79564654 8.12266802 6.70930434 8.00302867 7.8671055 7.31986493 8.06463648 8.05515773 8.18311808 7.40245152 7.24422752 7.96206731 7.52294092 7.00306546 7.60589 8.19973896 8.01961279 7.45007957 7.07326972 7.63530389 7.84776254 7.62070509 7.82204401 7.96554557 7.89729647 7.83201418 7.48549161 8.10167775 8.04558828 8.23483028 7.62559507 7.94093976 8.26873183 7.59085212 7.45587669 7.77064523 7.13089883 8.41405243 8.26359043 7.05617528 8.01301211 7.34601021 7.70975686 8.11372609 7.27239839 8.33567131 7.9373747 7.54433211 8.31874225 8.02943284 7.82803803 8.07713664 8.11072758 8.07090609 7.90470391 7.10660614 8.03592637 7.21523998 7.42057891 7.81197343 8.09864284 7.01211529 7.70466668 6.94697599 6.99393298 7.29979737 6.30991828 7.3714893 7.29301768 7.9157132 8.15190987 8.34045601 7.78322402 7.93020621 8.60153434 7.26542972 6.64639051 7.47873483 6.60665019 8.12563099 7.92298596 6.80239476 8.1942293 7.82404601 7.54960917 8.04878828 7.9793389 7.73193072 7.53902706 7.8860814 7.51752085 7.19293422 7.98956045 7.09837564 7.42654907 8.20521843 7.17778242 7.06475903 7.7363071 8.57734711 7.40853057 8.8508042 7.90838716 7.88231492 7.67322312 6.98841318 8.05070338 6.84587988 7.83597458 8.16166045 7.95507427 7.9266026 7.60439635 7.70526247 7.84384864 7.23705903 7.91205689 8.45531779 8.02617019 6.42971948 8.17751582 7.77904864 7.64921632 8.35231855 6.75693239 7.02108396 8.33086361 7.3395377 7.71423114 7.90100705 8.31139828 8.08023742 8.86785006 8.19146305 8.84361509 8.18032087 8.22951112 8.30152165 6.49223984 8.06148687 8.15478757 8.0163179 6.88755257 8.5310961 8.03430694 6.68461173 7.89357207 8.59969441 8.20794694 7.7186855 7.41216033 8.11969625 7.23273314 7.91935619 7.96901178 7.82003799 8.00636757 8.48260175 7.9585769 8.48673398 6.5366916 8.2890371 8.21066803 7.43838353 6.63331843 8.19698793 8.00969536 8.05197808 8.2839993 8.52516136 8.14322675 8.21878716 8.43814998 8.26100979 7.68708016 7.3588309 7.50218649 6.79122146 7.03878354 6.25382881 7.85166118 6.87729607 8.18868912 8.1285852 7.61579107 6.65929392 8.0955987 8.08332861 7.79152282 8.0677762 6.77992191 7.91278922 8.03268488 7.55485852 8.34995727 7.73455884 7.52833177 8.08948247 8.10772006 6.76849321 8.40290405 7.4465851 7.71199651 8.4250779 6.8134446 7.30921237 8.35467426 7.73979446 8.29654652 8.67590488 6.7214257 6.91572345 7.89803969 8.16621627 8.25062008 8.15765702 8.41183268 7.24779258 8.05832731 8.27384693 8.34521793 7.66387726 7.99395755 7.86365127 8.90923528 6.32793678 7.7664169 7.97453284 8.26616444 8.23217424 7.89170466 8.16905315 8.30647216 8.50309427 8.14612951 8.52118521 8.08209328 8.11671562 8.63408694 8.384804 6.3630281 6.71901315 8.50916102 6.44571982 6.55108034 7.9827577 7.90137735 8.31385227 8.16051825 8.35936911 8.25322765 8.27026911 6.13122649 5.66988092 8.07806788 6.79458658 8.49902922 7.38461038 8.4316353 7.64012317 7.45471995 7.8785342 7.57660977 8.14031554 7.95997453 8.2480057 8.67419694 7.57763383 8.64117917 7.79317435 8.47219583 7.49220304 8.22416351 6.56526497 7.63433724 8.18590748 7.92044651 8.40960798 8.74830491 8.54869186 7.83161728 8.38251829 9.51340355 8.20248245 7.88457651 8.24538447 8.52317526 8.13739583 8.72583206 8.02355239 8.75305552 8.32117831 8.27664913 8.4658999 8.3428398 8.870663 7.7419679 7.59287029 8.09346227 7.97246602 8.49084922 8.36170829 8.39615486 8.4763712 7.30988149 7.25347038 8.45956408 8.10470347 7.7151236 8.28147086 8.51919119 7.38274645 9.21433239 7.40913644 7.87966991 7.59438124 8.56025268 7.96415572 7.30047281 8.73230457 8.76248955 8.20849175 8.59600437 8.02780285 6.59304453 8.36404201 7.99631723 6.27287701 7.86518795 8.39840966 7.84149292 8.21338174 7.56060116 7.3639135 6.47697236 7.2758646 7.85088266 8.60886038 8.36869318 7.58273849 6.50727771 7.42416528 8.17329344 8.40514369 7.67508186 8.3663703 8.32845107 8.37562963 7.86787149 7.7935868 8.37793112 8.18004072 7.90396563 6.84585917 8.58858319 7.27517232 7.50328963 8.48879372 7.21376831 7.9359451 8.58671925 8.38708451 8.31630025 8.61974978 6.88959131 7.59839933 8.278936 8.2763947 7.54644627 8.70781355 8.34759041 7.7890404 7.74283596 7.87207398 7.89692466 7.93844555 8.32603269 7.63240113 7.15305163 8.21473583 7.49831587 7.94271754 7.32778054 6.39692966 7.8624972 8.25582843 8.12029131 8.43598314 7.16703788 8.39162997 7.78862607 8.32360844 6.5220928 7.65681009 7.89282553 7.52402142 7.96935774 7.81722279 8.00403151 7.96728018 6.19440539 7.36010397 7.95472333 8.44246965 7.90617884 7.59135705 8.77955746 6.46146818 8.56407678 8.33806653 7.3498737 8.83637393 8.58110652 7.83636976 8.97208318 8.07402622 8.42726848 7.45356187 6.90575328 7.88532924 8.00670085 7.68720297 8.55448898 7.13249755 8.44677073 7.42476176 8.29404964 8.40065938 7.82644314 7.26192709 7.10002717 8.43381158 7.39510755 8.09132127 8.78920309 8.62155321 8.07589363 7.74109909 7.64587583 7.66575343 8.42945428 8.37332282 7.1172055 9.06762407 7.51370925 7.74543561 8.27129265 7.83122021 7.98173329 7.71333789 8.6070339 8.35702444 8.55062797 8.52714352 7.61972421 7.46221494 7.49443022 8.51318517 7.68156036 8.44891435 7.74537867 7.88419993 8.41847722 8.38022734 8.01565761 8.5754621 8.26924452 8.55256034 6.99576616 7.58832368 7.56837927 7.93454761 8.54675199 8.28652137 7.88795934 7.81399568 8.57168138 7.38150189 8.59415423 8.4446225 7.82484569 7.85360481 7.90765159 7.85438121 7.54908271 8.47428569 8.25452888 5.94017125 7.61332498 6.96696714 7.01031187 8.03203531 7.89058253 8.05452261 8.57357352 6.99850964 8.632306 8.09681747 6.97354302 7.47590597 8.36287583 7.30249642 7.40427912 7.68063743 8.51117512 7.42535789 7.70391021 8.64822145 8.56978564 7.49997654 8.70615929 7.64730883 8.25088114 7.13489085 7.87891291 6.72503364 8.04494705 8.03495502 7.14203657 8.31016902 8.71111388 8.69784669 7.46565531 7.67089483 8.33327035 7.72001794 8.68777949 8.44031215 7.02908756 7.79800105 7.85476918 7.71110125 8.66042736 6.38012254 8.51719319 7.93487157 8.15708379 8.8607829 8.89151157 6.34563636 8.60337089 8.71768205 8.67077228 7.28138566 8.91462613 7.59034695 7.93343839 7.63771643 6.99209643 7.47477218 7.55695057 8.74193546 7.23849684 7.55171222 7.68753877 7.93666016 8.63052188 8.75935475 7.68257952 8.25009775 7.76330638 7.67925143 7.69984241 7.3472997 7.62266395 7.88945915 7.82284529 8.45318786 9.39681994 7.91498301 7.28000825 7.60539236 8.47010158 8.16791936 8.50512061 7.61035762 7.69393733 8.54286094 8.61613314 7.40610338 8.12148037 6.04025471 8.66905554 7.20563518 7.48717369 8.20385137 6.73815249 7.6746175 7.95366978 8.32675881 8.45105339 7.60787807 8.40737833 7.55590509 7.80016307 7.78239034 7.18159194 8.14438887 8.47845236 7.87397838 8.09376776 7.71912984 9.17367639 8.02747653 7.24136628 8.66215896 7.59689444 7.63723439 6.1737861 7.52886926 7.54380287 7.03570941 7.77569575 7.61628356 8.95286414 7.38585108 8.22469956 8.6143199 7.42595366 7.830823 8.83054301 7.88344635 8.09437844 6.08677473 8.05038445 7.19518732 7.73324565 7.65207075 7.41095188 7.59890046 7.78113851 7.69530313 7.03384633 8.06840296 7.40184158 7.79852305 6.80350526 7.31055016 8.53306654 8.2303108 7.30842542 7.24208236 7.15070146 7.31455283 7.6989362 7.16626597 8.42288251 8.59470963 8.06495089 7.82244473 8.68946441 7.26892013 7.20934026 8.76405327 8.59785109 7.99328233 7.30586003 7.69256965 8.04974629 7.67600993 9.12695876 7.06902343 7.91717199 7.64635372 8.63675243 7.48324442 7.72621265 8.91193434 8.79936008 7.62851763 8.46168048 7.51425465 8.6206519 8.66388757 7.75241721 7.13568735 8.58485184 8.78722033 7.32939056 7.69666708 8.54090972 8.20903627 7.93128476 8.78109474 7.94129557 7.64252413 8.16820293 7.90728361 8.65347081 7.84267147 8.89836561 7.76972479 7.75405264 8.38935982 7.52131798 7.77275272 7.47136309 7.26332962 7.99934295 7.75961415 7.81520706 8.09590353 7.675546 8.56598336 8.76872982 8.52912176 8.31213511 7.7857209 8.10651452 7.78945457 7.53689713 8.66733585 7.21890971 6.86901445 8.71358201 9.19927942 7.75833347 6.97728134 7.87549929 7.83557925 8.91998807 7.2882444 7.57353126 8.41072095 6.80461452 7.81318727 7.80302664 8.07277933 8.62766064 7.79276172 7.18690102 8.67248608 8.53895468 7.31920246 7.52023456 7.75319427 8.61794309 7.02817297 7.38709024 7.23056315 6.8956827 6.41345896 7.58120983 6.95463886 7.97418867 7.61775958 5.95064255 8.2136527 8.06211758 8.98719682] **************************************************************************************************** Unique Values of log_lot_measure are [7.95717732 8.66905554 8.84043544 ... 8.92864037 9.17253466 9.29339393] **************************************************************************************************** Unique Values of log_ceil_measure are [7.51207125 7.22983878 7.02997291 7.10660614 6.88755257 8.49494758 7.96206731 7.94803199 7.39633529 6.96602419 7.138867 6.90775528 7.6778635 7.47873483 7.27239839 7.1623975 8.13153071 7.8709296 8.23747929 8.40065938 8.24275635 5.96614674 7.29979737 7.61085279 7.39018143 6.62007321 7.52294092 7.85941315 8.30893825 7.66856111 7.95155933 7.12286666 7.7450028 6.97541393 7.37775891 7.46164039 7.47306909 7.32646561 7.81601384 7.25841215 7.13089883 6.99393298 6.77992191 7.15461536 7.22256602 6.85646198 6.98471632 7.93379687 7.97590836 7.64491934 7.3395377 7.00306546 7.26542972 6.91770561 7.72312009 8.2401213 8.02289687 8.39389498 7.14677218 7.17011954 7.06475903 8.16337132 7.05617528 7.60589 7.20785987 8.25842246 7.65444323 7.76217061 7.40245152 7.09007684 7.38398946 7.44424865 7.70526247 6.95654544 7.08170859 7.70975686 7.79729127 7.41457288 8.03915739 7.20042489 7.74932246 7.31322039 6.93731408 7.45587669 7.3714893 6.82437367 7.8038433 7.68248245 7.69165682 7.49554194 6.92755791 6.73340189 7.33302301 6.86693328 7.50659178 6.83518459 7.36518013 7.25134498 6.69703425 7.59588992 8.46379241 7.34601021 7.75362355 7.48436864 7.04751722 7.25770768 7.98616486 7.58069975 7.60090246 6.63331843 7.27931884 8.29903718 6.57925121 8.01301211 7.57044325 7.18538702 8.0677762 7.36770857 7.46737107 7.86326672 6.74523635 7.09837564 7.3065314 6.89770494 6.67203295 7.90470391 7.11476945 7.05531284 8.22147895 7.4899709 8.11999383 7.58578882 7.3524411 7.79564654 8.07090609 6.70930434 7.31986493 7.7406644 8.06463648 8.05515773 8.18311808 7.24422752 7.40853057 7.3588309 8.19973896 8.01961279 7.45007957 7.07326972 7.56008047 7.62070509 7.53369371 7.82204401 7.96554557 7.89729647 7.8671055 7.48549161 7.63046126 7.59085212 7.43248381 7.65917137 7.79975332 7.62559507 7.54960917 8.26873183 7.29301768 8.09254526 8.26359043 7.80791663 7.85554468 8.33567131 7.9373747 8.00302867 7.82404601 7.64969262 7.82803803 8.07713664 8.11072758 7.55485852 8.03592637 6.87729607 7.21523998 7.77485577 7.75790621 7.03878354 8.09864284 6.94697599 8.17470288 7.54433211 6.30991828 7.9157132 8.15190987 7.78322402 8.60153434 6.64639051 6.60665019 6.80239476 7.91205689 7.81197343 7.28619171 7.96901178 7.83201418 7.73193072 7.01211529 7.02108396 7.19293422 7.50108212 7.98956045 7.56527528 8.20521843 7.17778242 8.04878828 8.20248245 8.58110652 7.72753511 7.71423114 7.67322312 6.98841318 8.05070338 7.87473913 6.84587988 7.83597458 8.16166045 7.42654907 7.9266026 7.60439635 7.84384864 8.27129265 8.02617019 6.42971948 8.17751582 7.64921632 6.75693239 8.35231855 7.01750614 8.33086361 7.90100705 8.14902387 8.31139828 8.56407678 8.19146305 8.36869318 8.18032087 8.22951112 7.53902706 8.30152165 6.49223984 7.70074779 8.06148687 8.15478757 8.0163179 8.5310961 6.5220928 7.89357207 6.65929392 7.8860814 8.29654652 8.20794694 7.7186855 7.51752085 7.41216033 8.11969625 7.77064523 7.42057891 7.04228617 7.91935619 8.48260175 7.57558465 8.37101068 8.25582843 6.5366916 8.2890371 8.21066803 6.8134446 8.04237801 7.68708016 7.52833177 8.16051825 7.85166118 8.2839993 8.52516136 8.14322675 7.66387726 8.43814998 7.92298596 7.84776254 8.23483028 7.43838353 7.50218649 6.79122146 8.13446757 7.77904864 7.23705903 6.25382881 8.17188201 7.61579107 7.63530389 8.0955987 8.00969536 7.79152282 7.88231492 7.99967858 8.34045601 8.07402622 8.1942293 7.73455884 7.93020621 7.9793389 7.88983375 7.7363071 8.08948247 6.68461173 8.10772006 6.76849321 7.4465851 7.83991936 7.30921237 6.7214257 7.73979446 7.95507427 8.26100979 8.49902922 8.19698793 8.30399997 6.91572345 7.89803969 7.69621264 8.16905315 7.9585769 8.11372609 8.03268488 8.04558828 7.06047637 8.34521793 8.10470347 7.99395755 7.86365127 8.74671635 6.32793678 7.7664169 7.97453284 7.97246602 7.99294455 7.89170466 8.2763947 8.52118521 8.08209328 6.50727771 8.45531779 8.384804 8.28652137 6.3630281 6.71901315 8.34995727 8.21878716 7.90838716 8.08023742 8.23217424 8.13739583 6.44571982 6.55108034 7.9827577 7.90137735 8.31385227 8.05832731 8.35936911 8.27026911 6.13122649 5.66988092 8.07806788 6.79458658 7.38461038 8.10167775 7.45471995 7.8785342 7.40974195 8.14031554 8.12266802 8.21338174 7.95997453 8.44891435 7.57763383 7.78738203 8.31874225 7.79317435 7.94093976 8.47219583 7.49220304 6.56526497 7.63433724 8.18590748 7.68017564 8.1285852 8.38022734 8.54869186 7.83161728 9.14952823 7.82003799 7.88457651 8.08641028 8.02355239 8.49699048 8.4658999 8.15765702 7.94449216 8.3428398 8.60886038 7.7419679 7.59287029 7.89020821 7.64012317 6.19440539 7.24779258 7.25347038 8.16621627 7.7151236 8.29404964 7.38274645 8.94637483 7.40913644 7.87966991 7.59438124 8.14612951 6.59304453 7.30047281 8.08332861 8.39840966 8.4250779 8.00836557 8.52714352 8.02780285 8.22416351 8.36404201 6.27287701 6.38012254 8.05197808 7.21155673 8.22684089 7.84149292 7.56060116 7.3639135 6.47697236 8.50309427 7.85088266 8.12563099 7.58273849 7.42416528 7.67508186 8.37562963 7.86787149 7.7935868 8.18004072 7.90396563 8.36170829 8.56216656 8.58858319 7.26332962 8.02943284 7.27517232 6.85012617 7.21376831 7.9359451 8.48879372 8.28147086 6.69208374 8.2480057 8.2160881 8.00636757 7.7890404 6.39692966 7.74283596 8.27384693 7.87207398 7.89692466 7.93844555 8.32603269 7.63240113 6.90975328 8.21473583 8.24538447 8.11671562 7.49831587 7.94271754 7.12929755 7.8624972 8.12029131 8.43598314 7.16703788 7.78862607 8.32360844 7.65681009 7.89282553 7.52402142 7.96935774 7.81722279 7.71110125 8.39615486 8.18868912 7.96728018 6.63987583 7.36010397 8.29154651 7.95472333 7.90617884 7.59135705 8.55256034 6.34563636 8.32845107 6.46146818 7.3498737 7.83636976 8.97208318 8.25062008 8.42726848 7.45356187 8.35467426 6.6821086 7.88532924 8.00670085 8.55448898 6.89972311 8.44677073 7.42476176 7.82644314 8.57734711 7.26192709 7.10002717 7.39510755 8.09132127 8.54733435 8.48673398 8.07589363 7.74109909 7.64587583 7.66575343 8.42945428 8.25322765 6.87005341 8.71931738 7.51370925 7.74543561 7.83122021 7.98173329 7.71333789 8.4316353 8.26616444 8.35702444 7.61972421 7.46653916 8.37793112 7.2765564 7.99631723 7.49443022 8.51318517 7.68156036 8.31630025 7.46221494 8.32117831 7.88419993 8.01565761 7.93092537 6.99576616 7.58832368 8.44246965 7.56837927 7.88795934 7.81399568 7.38150189 8.59415423 7.82484569 7.85360481 8.51919119 7.90765159 7.85438121 8.40960798 7.54908271 8.47428569 8.25452888 5.94017125 8.39162997 7.61332498 6.96696714 7.01031187 8.03203531 8.05452261 8.42068229 6.99850964 8.09681747 6.97354302 7.47590597 8.15622332 7.30249642 7.40427912 7.68063743 7.42535789 7.70391021 8.40290405 7.49997654 7.16626597 8.25088114 7.13489085 7.87891291 6.72503364 8.04494705 8.37332282 8.03495502 6.99759598 7.14203657 8.31016902 8.71111388 8.69784669 7.46565531 8.30647216 7.67089483 7.23273314 8.33806653 7.72001794 7.02908756 7.85476918 8.66042736 8.51719319 8.33327035 7.93487157 8.15708379 6.1737861 8.7671734 8.60337089 8.71768205 8.67077228 8.56025268 7.28138566 8.62155321 7.19218206 7.93343839 7.59839933 8.34759041 8.41183268 7.63771643 6.99209643 7.47477218 7.55695057 8.48467 7.23849684 7.55171222 7.68753877 7.93666016 8.75935475 8.25009775 8.278936 6.41345896 7.67925143 7.69984241 7.3472997 7.62266395 7.88945915 7.82284529 9.05602301 7.91498301 7.28000825 7.60539236 8.16791936 7.61035762 8.54286094 7.40610338 7.99496952 8.3663703 8.59969441 6.04025471 7.20563518 7.37462902 7.68432407 6.73815249 7.6746175 7.58882988 8.32675881 8.38708451 7.60787807 8.40737833 7.55590509 7.80016307 7.78239034 7.18159194 8.43381158 6.90274274 8.47845236 7.87397838 8.09376776 8.48052921 8.02747653 7.24136628 7.16084591 7.63723439 7.54380287 7.77569575 7.61628356 8.80387476 7.38585108 7.98344006 7.42595366 7.830823 7.88344635 8.09437844 6.08677473 8.05038445 7.19518732 7.73324565 7.65207075 7.41095188 7.59890046 7.78113851 7.69530313 8.06840296 7.29912146 8.40514369 7.79852305 6.80350526 8.53306654 8.11252776 7.24208236 7.15070146 7.31455283 8.41405243 6.76388491 7.00760061 8.59470963 8.06495089 7.82244473 8.50714286 7.26892013 7.20934026 8.36287583 7.99328233 7.30586003 7.69256965 7.83715965 7.67600993 8.73230457 7.06902343 7.91717199 7.64635372 7.48324442 7.72621265 8.91193434 6.95463886 7.51425465 8.38251829 8.19007705 7.13568735 7.69666708 8.20903627 7.93128476 7.94129557 8.16820293 7.90728361 7.84267147 7.64730883 8.89836561 7.75405264 8.38935982 7.52131798 7.77275272 7.47136309 8.44031215 7.99934295 7.81520706 8.09590353 7.675546 8.56598336 8.76872982 8.52912176 8.70781355 7.35500192 8.31213511 7.7857209 8.10651452 7.78945457 7.53689713 6.86901445 8.71358201 9.08930204 7.75833347 6.97728134 7.87549929 7.83557925 7.2882444 7.57353126 6.80461452 7.81318727 8.54480836 7.80302664 8.07277933 8.62766064 7.79276172 7.18690102 7.62851763 7.31920246 7.52023456 7.75319427 8.61794309 7.38709024 7.23056315 6.8956827 7.58120983 7.97418867 5.95064255 8.2136527 8.06211758 8.81284343] **************************************************************************************************** Unique Values of log_basement are [0. 6.39859493 6.52356231 6.89871453 6.96696714 5.70711026 5.30330491 6.68586095 6.94793707 5.91620206 6.90875478 6.47850964 6.66057515 6.11146734 6.58063914 7.52348131 6.64768837 6.23636959 6.43133108 7.61134772 6.78105763 6.31173481 6.73459166 6.55250789 6.19644413 5.67332327 7.2730926 6.91869522 6.76964198 6.92853782 6.80350526 6.27476202 7.44483327 7.05703698 6.49375384 6.38181602 7.16317239 7.00397414 6.60800063 5.88887796 7.10742547 6.79234443 7.25205395 7.42714413 5.33753808 6.67329797 5.73979291 4.7095302 6.93828448 5.3981627 4.51085951 6.63463336 7.02197642 7.03085748 7.1553963 7.35946764 5.14166356 6.88857246 6.99484999 6.44730586 6.08904488 7.03966035 6.69826805 6.85751406 5.99396143 6.17586727 6.87832647 5.77144112 6.86797441 7.12367279 5.56452041 6.84694314 7.08254857 6.8145429 6.25575004 6.56667243 7.29369772 6.06610809 6.83625928 7.37838371 6.59441346 7.19368582 5.01727984 5.52545294 6.15485809 6.36475076 6.04263283 6.95749737 7.09090982 4.39444915 7.43307535 6.29341928 6.72262979 7.31388683 7.23056315 5.48479693 7.41517511 7.40913644 6.01859321 4.79579055 7.06561336 6.46302946 6.50876914 5.80211838 7.38461038 6.2166061 6.97634807 6.02827852 7.24494155 7.30047281 4.61512052 7.52886926 7.8713112 7.46221494 7.33040521 5.63835467 7.25911613 6.98564182 7.09920174 6.13339804 7.20860034 6.74641213 7.51261754 6.34738921 5.96870756 7.36581284 7.01301579 7.04838641 5.60211882 7.23777819 5.83188248 7.67832636 5.35185813 5.94279938 5.19849703 4.94875989 7.39079852 4.26267988 6.41509696 6.32972091 7.17088848 6.76041469 6.82546004 6.62140565 7.55538194 7.1276937 7.3336764 7.49609735 6.71052311 7.11558213 7.1861443 7.13966034 7.28000825 5.58349631 5.420535 5.25227343 5.08140436 7.13169851 7.26612878 7.07411682 6.7580945 7.32052696 7.34665516 7.20117088 7.90875474 5.86078622 3.93182563 4.87519732 7.45645456 6.53813982 7.76259605 7.17854548 7.14755927 7.68294317 7.22329568 7.35308192 6.37842618 7.57095858 7.4506608 8.32627479 7.63578686 7.48493028 7.65964295 7.50714108 7.15617664 7.215975 7.40306109 6.4019172 7.42117753 5.44241771 7.6501687 7.46794233 7.30720231 7.53422833 7.76684054 3.71357207 6.81014245 7.47363711 6.50279005 7.28687641 4.98360662 7.54486107 7.6453977 6.84161548 7.34018684 5.49716823 7.37211803 5.17614973 5.46383181 5.92692603 7.55013534 5.62401751 4.11087386 5.6167711 7.63094658 7.43897159 7.94129557 7.32712329 7.50163446 7.69666708 7.84424072 2.68742928 5.69035945 2.39789527 7.4905294 7.70571282 6.48157713 7.95542509 5.58724866 7.75405264 7.67368813 7.60638739 7.53955883 7.91242312 7.74109909 7.58120983 8.15507489 7.0352686 7.57609734 7.30114781 6.24610677 7.71467747 6.07764224 6.75925527 8.48073665 6.5581978 5.64897424 7.51806418 7.82043952 7.3969486 7.65491705 5.5174529 6.03548143 7.71912984 7.85979918 8.00670085 7.15148546 7.5963923 6.91671502 8.08978918 6.16541785 7.47929964 6.66568372 7.7836406 6.82001636 7.13009851 6.24804287 7.86748857 6.77536609] **************************************************************************************************** Unique Values of log_living_measure15 are [7.52294092 7.3395377 7.04751722 7.62559507 7.47873483 8.66388757 7.98616486 7.29979737 7.25134498 7.51207125 7.3524411 7.40245152 7.42654907 7.39018143 7.59588992 7.27239839 7.50108212 7.82803803 6.80239476 7.49554194 8.14612951 7.7450028 8.278936 7.68248245 7.74932246 7.50659178 7.28619171 8.03268488 7.58069975 8.29903718 7.85554468 7.57558465 7.138867 7.92298596 7.31756736 7.57044325 7.32646561 7.59085212 7.17011954 8.00302867 7.60090246 7.47306909 7.61579107 7.4899709 7.88231492 7.9266026 7.46164039 7.41457288 7.22256602 7.61085279 7.63530389 7.02820143 7.71423114 7.02108396 7.9157132 7.55485852 8.16337132 7.48436864 7.3714893 7.87473913 8.08332861 7.67322312 8.06148687 7.70975686 7.85941315 8.01301211 7.53902706 7.56008047 7.19293422 7.38398946 7.65917137 7.44424865 7.29301768 7.52080717 7.79729127 7.72312009 6.98471632 8.15478757 8.31874225 7.21523998 7.62070509 7.79152282 7.58578882 7.09837564 7.89729647 7.54960917 7.84384864 7.25841215 7.7186855 6.86693328 7.43248381 7.68708016 7.39633529 7.27931884 7.36518013 7.31322039 8.4763712 7.77064523 7.89357207 8.07090609 7.24422752 7.75790621 7.84776254 6.93731408 7.12286666 7.17778242 7.20042489 7.03878354 7.78530518 6.94697599 7.83597458 7.20785987 8.29654652 7.36770857 7.63046126 7.15461536 7.37775891 7.26542972 7.09007684 8.0163179 6.92755791 8.04302089 7.3065314 7.69621264 7.52833177 7.77904864 7.45587669 7.02997291 7.85166118 7.33302301 7.91205689 8.00636757 8.03592637 7.23705903 7.66856111 8.12266802 7.45007957 7.13089883 7.82404601 7.11476945 7.80791663 7.7406644 7.34601021 7.82003799 7.96554557 7.40113042 7.76217061 7.7664169 8.11072758 7.31986493 7.72753511 8.2480057 7.53369371 7.40853057 7.14677218 7.60589 8.0955987 7.10660614 7.8038433 7.64491934 8.13739583 8.44891435 7.99294455 7.95155933 7.54433211 8.23483028 7.8860814 8.04878828 8.07713664 8.05515773 7.99967858 7.93379687 7.56011916 7.06475903 6.97541393 7.42057891 7.00306546 7.43838353 8.3663703 8.18590748 7.4192731 8.48260175 6.60665019 8.12563099 7.06902343 7.81197343 8.05197808 6.96602419 7.79564654 7.64012317 7.83991936 7.99631723 7.73193072 7.78738203 7.9373747 7.78322402 7.51752085 7.3588309 7.01211529 7.9793389 7.69165682 8.2763947 7.44510652 7.6778635 7.22983878 6.82437367 6.98841318 7.70074779 7.88983375 7.6912001 7.97246602 7.65444323 7.18538702 8.24538447 8.18868912 8.02617019 6.42971948 8.25322765 7.64921632 8.00969536 8.01961279 8.07402622 6.89770494 6.79122146 7.39427616 8.18032087 7.46737107 7.75362355 7.9585769 8.16905315 8.34521793 8.50106381 6.8134446 7.94449216 7.93020621 7.08170859 7.90838716 8.37562963 8.04558828 8.11969625 6.75693239 7.56527528 7.23736978 7.96206731 7.90100705 7.91935619 7.97590836 7.64969262 8.39162997 8.11671562 8.2890371 7.79975332 8.03915739 6.7214257 7.47760424 7.94093976 8.09864284 8.02289687 7.98956045 7.72001794 7.8671055 7.07326972 8.14322675 8.20903627 8.43814998 7.7363071 7.77485577 8.13446757 7.7362849 8.14031554 7.66387726 7.05617528 7.34932643 6.99393298 8.22416351 8.2839993 8.17470288 8.10167775 8.26616444 7.46657277 7.04717761 7.96901178 6.88755257 8.21066803 8.08641028 7.64156444 8.05832731 7.9827577 8.02943284 7.4465851 6.95654544 6.85646198 7.8709296 7.90470391 8.30893825 7.30921237 7.83201418 7.81601384 6.91770561 8.50512061 8.30152165 8.10772006 8.41405243 7.03174126 7.85321639 8.0677762 8.15765702 7.24779258 7.76387129 7.35994684 6.65929392 7.69905059 8.13153071 7.25488481 8.28147086 8.04237801 6.63331843 7.88795934 8.71768205 7.6182373 7.41697962 6.62007321 6.70930434 8.35231855 8.20521843 7.70526247 8.50309427 8.08209328 6.77992191 7.60638739 7.52940646 8.27384693 8.08023742 8.19698793 8.34045601 8.17751582 6.71901315 7.47461214 8.31630025 8.5564139 6.87729607 7.1623975 8.01895468 8.24275635 7.85010355 7.40657096 8.23217424 8.16051825 7.89095672 7.8785342 7.59034695 7.64730883 6.74523635 8.17188201 8.10470347 8.19146305 8.33086361 8.11372609 7.80971687 7.40256168 8.47219583 7.79110951 6.5366916 8.1942293 7.77527585 8.30399997 8.17667277 6.56526497 7.41298036 8.20248245 8.08948247 8.46800295 7.47309227 8.48673398 7.15617664 8.1285852 7.86326672 8.19973896 7.56229203 7.24351297 7.83122021 8.30647216 8.14902387 8.20576647 7.92551898 8.14753176 7.92974215 7.95507427 7.2775682 7.85786756 8.22684089 7.0800265 8.09254526 8.00503334 7.2758646 7.94803199 8.06463648 7.30988149 8.26359043 7.60539236 8.20794694 7.03702761 8.31385227 7.21862206 7.7857209 7.36264527 7.30047281 7.91132402 8.22951112 7.40184158 8.25842246 8.38708451 7.76556908 8.16621627 7.37622033 7.56060116 6.99576616 7.14203657 6.73340189 8.44031215 7.63867982 7.48484143 7.8965527 7.84267147 8.44677073 8.22147895 7.48336405 7.61431215 7.71408849 7.80425138 7.47136309 7.74283596 7.91022371 7.52448391 7.79317435 7.3284439 7.65539064 7.67600993 7.82484569 6.90775528 8.32117831 7.21376831 8.18311808 7.7151236 7.43108503 6.84587988 6.76849321 7.51425465 7.18992217 7.27904402 8.33567131 7.39086206 7.60290046 8.43598314 8.0580108 8.21338174 8.04798801 6.69703425 8.32603269 7.63720305 8.33327035 7.93666016 7.32778054 8.22013396 8.21878716 7.60787807 7.68063743 7.86346415 7.63771643 7.60430751 6.68461173 7.38274645 7.34689338 7.26192709 7.59135705 7.74240202 7.85670679 7.7419679 8.11222796 7.84541604 8.29154651 8.33806653 8.45318786 8.4316353 7.864804 7.91809735 8.63052188 8.40960798 7.35819375 7.65633717 8.4446225 6.90575328 7.85476918 8.19428648 7.42476176 6.67203295 7.67042852 7.71244383 8.34759041 8.29404964 7.21498339 8.3044949 7.158514 7.13568735 8.03495502 8.39389498 7.54221346 6.57925121 8.25582843 7.94271754 7.64587583 8.32845107 7.29029288 7.86901938 7.80710329 8.15190987 7.99530662 7.80628929 7.20846256 7.45408491 8.42726848 7.87625888 7.24649056 8.38935982 7.68156036 8.36404201 7.60190196 8.25062008 8.34995727 7.96519829 6.83518459 7.58832368 7.20934026 7.14424571 7.63961199 8.3990851 5.98896142 7.37042489 7.81399568 7.32383057 7.78821156 7.49443022 8.19560957 7.86211221 7.85360481 7.19743535 7.88870952 7.85449309 8.40514369 7.24770468 7.11314211 7.77653503 7.21303166 8.28652137 7.87891291 8.50714286 7.05260268 7.27116189 8.13757493 8.4250779 7.8458075 6.98286275 7.47590597 7.51768466 7.30249642 7.34342623 7.81359155 7.2772956 7.99328233 8.2160881 8.36869318 7.17548971 8.53895468 7.98173329 7.5063621 7.36010397 8.45105339 8.32360844 7.27170371 7.21103233 8.26100979 7.76872582 7.48773376 7.28482091 8.41847722 7.59697113 7.8811822 7.54644627 7.34213173 7.30672707 7.31920246 7.66806771 7.93487157 7.65164689 8.5310961 8.47010158 6.13122649 7.70571282 7.26323088 8.58298093 7.38742153 8.46379241 7.78155596 7.57715267 7.75190533 8.56025268 7.00124562 7.47477218 7.5682898 7.49058324 7.75491027 7.17242458 7.45471995 7.89989532 8.46168048 8.06808963 7.46508274 7.4175804 8.27129265 7.17165682 8.26873183 6.93430207 7.27793857 8.23747929 7.88945915 7.65396918 7.32974969 7.63531901 7.31993895 7.99245759 7.91498301 7.75619534 8.1994642 7.15790752 8.49699048 7.52563998 7.57198845 7.47075968 6.96224346 7.86518795 8.37101068 7.24279792 7.86787149 8.25218544 7.34089037 7.90985667 7.43602782 7.49676959 8.03207095 8.19367667 7.60837447 7.95437227 8.43381158 7.73813862 7.29717723 7.95472333 7.46450983 8.2401213 8.35467426 7.8777457 8.42068229 7.20489251 7.58624379 7.77569575 7.83042562 7.96415572 7.26332962 7.88344635 7.18433018 8.05617968 7.50347129 8.00903069 7.69888629 8.39840966 7.30787278 7.59604745 7.74716497 7.61480536 8.3428398 7.78945457 7.63101973 7.66152708 7.41095188 8.4658999 7.57426752 7.51324634 7.69530313 7.79852305 7.92479591 8.15880249 7.90036104 7.2276625 7.16626597 7.31455283 6.55108034 7.62851763 7.82244473 7.29057938 7.70255611 7.33758774 7.83715965 7.55695057 7.44307837 8.22309055 8.632306 7.41454746 7.17701877 7.6453977 7.35627988 8.40737833 7.830823 7.74759684 7.8624972 7.88021877 7.38770924 7.48953907 7.81923445 7.80016307 7.52131798 7.10332206 7.27447956 7.49831587 7.75961415 7.72841953 7.73149203 7.675546 8.1727291 7.79063954 7.23927721 7.40905827 7.17168238 7.96311206 8.36170829 7.17319174 7.47703847 7.53689713 8.12512476 7.94520113 7.30981612 8.31139828 7.5370664 6.93991587 8.41072095 7.61923342 7.62657021 7.42816725 7.84648843 7.27378632 7.54134026 7.83478811 7.51762444 7.33414443 8.52317526 7.82204401 7.52056845 8.00117094 7.37314717 7.42535789 7.40123126 7.34858753 7.62938816 7.69484807 6.9048256 7.14519613 7.66949525 7.56734568 7.21817684 7.68478394 7.23056315 7.67108871 7.47110708 7.72134861 7.58695719 7.49108759 7.58120983 7.3568337 7.05272105 7.9497048 8.13885675 7.90248744] **************************************************************************************************** Unique Values of log_lot_measure15 are [ 7.88870952 8.31287139 8.84043544 ... 8.8730478 10.75883861 9.02869858] **************************************************************************************************** Unique Values of ceil_1.5 are [0. 1.] **************************************************************************************************** Unique Values of ceil_2.0 are [1. 0.] **************************************************************************************************** Unique Values of ceil_2.5 are [0. 1.] **************************************************************************************************** Unique Values of ceil_3.0 are [0. 1.] **************************************************************************************************** Unique Values of ceil_3.5 are [0. 1.] **************************************************************************************************** Unique Values of coast_1.0 are [0. 1.] **************************************************************************************************** Unique Values of furnished_1.0 are [0. 1.] **************************************************************************************************** Unique Values of year_sold_2015 are [0. 1.] **************************************************************************************************** Unique Values of warm_month_sold_1.0 are [0. 1.] **************************************************************************************************** Unique Values of zip_price_cat_medium_price are [1. 0.] **************************************************************************************************** Unique Values of zip_price_cat_high_price are [0. 1.] **************************************************************************************************** Unique Values of basement_category_Small Basement are [0. 1.] **************************************************************************************************** Unique Values of basement_category_Large Basement are [0. 1.] ****************************************************************************************************
# Checking testing data's unique values to make sure they match expectations after imputation and rounding
columns = x_test_im.columns
for col in columns:
print('Unique Values of {} are \n'.format(col),x_test_im[col].unique())
print('*'*100)
Unique Values of room_bed are [3. 4. 2. 3.4 5. 3.6 6. 3.2 3.8 1. 8. 7. 2.8 2.2 9. 4.2 5.4] **************************************************************************************************** Unique Values of room_bath are [2.75 2.5 1. 3.5 2. 1.75 1.5 2.45 2.25 0.75 3. 3.25 4.5 4.75 3.75 5.25 5. 4. 6. 1.4 1.1 4.25 1.7 5.5 1.6 5.75 0.5 2.45 1.55 2.55 1.85 1.15 1.3 1.9 2.3 2.05 6.25 3.65 1.65 3.25] **************************************************************************************************** Unique Values of living_measure are [1950. 2340. 3360. 2310. 800. 1320. 3600. 1540. 1130. 2430. 1240. 1920. 1410. 950. 1390. 1870. 1330. 2290. 2300. 1480. 1280. 2460. 1580. 1670. 780. 1880. 1610. 1300. 1820. 2020. 2560. 1940. 2060. 830. 2130. 2470. 2410. 1680. 1890. 2570. 1930. 1070. 1190. 3040. 3290. 1010. 1740. 2090. 810. 1760. 2360. 1600. 1660. 4570. 1530. 2390. 1990. 2680. 3370. 1960. 3030. 1230. 880. 1490. 5120. 2320. 2210. 2440. 2990. 1750. 1860. 1640. 1850. 1650. 1050. 1780. 2420. 4470. 720. 3940. 3340. 1840. 2550. 6810. 2000. 2620. 1700. 2240. 1630. 1910. 1060. 2120. 2150. 3020. 5090. 1090. 2160. 6055. 1690. 2930. 1150. 2588. 2230. 1510. 760. 1180. 1570. 1520. 2070. 2040. 1140. 3760. 1720. 1970. 1310. 1790. 2520. 1830. 2030. 840. 1200. 1550. 1210. 3570. 2170. 4120. 1440. 1360. 2260. 2029. 2960. 5461. 2720. 2700. 2820. 3610. 1250. 1160. 3110. 1657. 2730. 1420. 1810. 2980. 2500. 2400. 2760. 920. 2670. 2800. 3410. 1980. 1470. 1100. 3810. 4910. 3210. 2200. 2100. 5490. 3970. 1260. 3460. 3800. 910. 1340. 2710. 1290. 3180. 3510. 2110. 3400. 1560. 2600. 2280. 2220. 1500. 1400. 3520. 770. 900. 2480. 1800. 1770. 4168. 3710. 3670. 1020. 960. 1370. 1380. 2920. 2180. 970. 2080. 2810. 1430. 3650. 2495. 1730. 1764. 2590. 1000. 2450. 4500. 3090. 3010. 750. 990. 4340. 3550. 980. 1270. 1900. 1170. 2490. 1078. 3354. 2840. 998. 3140. 3320. 2860. 4490. 2890. 670. 500. 3680. 1255. 4690. 1220. 2140. 3000. 3900. 1110. 4560. 4010. 3960. 2594. 3260. 700. 2740. 2750. 2270. 3190. 3200. 3080. 2190. 3270. 1030. 2370. 2770. 1590. 1620. 1715. 2610. 2250. 1460. 4440. 3470. 930. 2780. 2790. 2380. 2580. 1350. 3840. 2900. 1120. 2540. 3770. 600. 890. 3150. 4210. 2050. 2910. 820. 2640. 2690. 4083. 3530. 1986. 850. 2350. 6530. 3240. 2010. 3060. 2970. 3070. 1040. 2650. 3990. 2330. 3176. 2830. 1553. 3100. 3630. 790. 1769. 2510. 3004. 4130. 4230. 4400. 3450. 3350. 3660. 3230. 5844. 3910. 8010. 1811. 5670. 860. 2630. 1413. 3430. 2660. 3850. 1450. 3690. 3050. 3056. 580. 5300. 940. 4060. 3830. 2584. 590. 410. 894. 5760. 1710. 3160. 3580. 3480. 6670. 7220. 710. 3890. 1752. 5050. 4190. 2870. 1689. 3170. 2880. 3540. 3220. 4800. 2850. 2940. 6240. 3500. 3595. 4770. 2798. 620. 4300. 4750. 809. 4540. 4260. 3490. 2530. 4360. 4040. 3620. 2257. 4310. 3820. 1864. 4420. 3860. 1714. 3130. 7000. 3880. 3640. 2245. 2950. 2506. 1601. 4645. 1528. 2356. 3310. 3420. 4700. 2313. 3750. 3120. 3870. 4170. 3217. 570. 4850. 1824. 2432. 4660. 1982. 3780. 3920. 4030. 1080. 1509. 2196. 4410. 2557. 1313. 3950. 1122. 4380. 3440. 5960. 740. 4680. 5180. 4000. 4610. 1358. 3250. 3700. 3045. 2163. 1076. 5740. 4115. 1651. 5067. 7620. 1232. 5000. 5710. 4920. 4960. 4080. 730. 3002. 2481. 1445. 5320. 3280. 1489. 3930. 3730. 4290. 2456. 640. 1802. 540. 680. 3380. 5370. 3597. 4520. 3266. 4020. 3590. 4870. 4070. 2755. 4110. 3560. 1384. 3390. 1494. 6490. 4710. 870. 4820. 1414. 6030. 650. 5270. 4100. 4225. 1584. 610. 4270. 4150. 4090. 2185. 1847. 4530. 5150. 1981. 4200. 5210. 2716. 2008. 4460. 1088. 3330. 5190. 2005. 5340. 2632. 660. 1408. 1458. 1934. 4510. 2403. 1516. 4620. 5450. 1131. 4180. 2514. 4640. 5550. 3555. 3720. 4740. 4160. 2734. 4140. 3740. 430. 1658. 4330. 1422. 5540. 4720. 2223. 2154. 7850. 3300. 2835. 2844. 2828. 2683. 4370. 2605. 1552. 5305. 1909. 2807. 2259. 470. 6410. 7710. 1785. 4590. 1465. 4890. 2075. 6900. 4475. 2538. 3786. 1776. 5790. 2242. 4930. 6640. 3596. 8020. 370. 2542. 4240. 6210. 4390. 1068. 5310. 1767. 1239. 7080. 4575. 4350. 4760. 2452. 520. 5100. 3915. 3847. 3790. 1594. 2244. 4550. 3202. 690. 1867. 1954. 2009. 5110. 1788. 2344. 2052. 5720. 7390. 5610. 1646. 1175. 4320. 5840. 4050. 2393. 1983. 5640. 4580. 4220. 2555. 4280. 4600. 4670. 2216. 4250. 2795. 2026. 6380. 4883. 4650. 1571. 1322. 2015. 2732. 2329. 5774. 3305. 2507. 3402. 5350. 7350. 4730. 2229. 3238. 3980. 5620. 4386.] **************************************************************************************************** Unique Values of lot_measure are [12240. 52272. 7685. ... 82328. 6033. 7495.] **************************************************************************************************** Unique Values of sight are [0. 2. 1. 3. 4. 0.4 0.8] **************************************************************************************************** Unique Values of condition are [3. 2. 5. 4. 1. 3.8 3.2 3.6] **************************************************************************************************** Unique Values of quality are [ 7. 8. 9. 6. 4. 11. 10. 13. 12. 5. 3.] **************************************************************************************************** Unique Values of ceil_measure are [1250. 2340. 3360. 1480. 800. 1090. 3600. 1540. 1130. 1410. 960. 1920. 950. 1950. 1390. 1870. 900. 2290. 2300. 1020. 1280. 1810. 1070. 780. 1100. 1610. 1300. 1120. 1080. 1940. 1030. 830. 2130. 1570. 2410. 1680. 1490. 2570. 1930. 870. 1678. 1190. 1520. 3290. 1010. 1740. 2090. 810. 1760. 1180. 1600. 1880. 1660. 4570. 2390. 1780. 2680. 1526. 1220. 1670. 1060. 880. 770. 2320. 2210. 1910. 2990. 1290. 1140. 1850. 1650. 850. 2420. 4470. 720. 1330. 3230. 3340. 1710. 2550. 6110. 2310. 2620. 1460. 980. 1000. 1170. 2150. 2230. 5090. 2160. 3555. 1860. 1150. 2588. 1230. 1340. 760. 920. 2640. 1720. 1370. 1400. 1310. 1790. 1690. 1830. 840. 1200. 2030. 2860. 2170. 3030. 3970. 990. 1360. 2260. 2029. 2440. 1500. 3265. 910. 1240. 2700. 1270. 2820. 3610. 580. 3110. 1657. 2730. 2980. 2500. 2240. 1580. 2400. 2760. 2070. 1420. 2360. 1560. 3810. 820. 1350. 2100. 3210. 1320. 2200. 2470. 5490. 3570. 2120. 2560. 3800. 1630. 1210. 670. 2710. 3510. 2110. 3400. 1050. 1430. 2600. 2280. 2220. 930. 3520. 2480. 1800. 1770. 3222. 3710. 3670. 1750. 1380. 2920. 970. 1970. 3650. 1470. 2495. 1730. 1764. 3370. 2590. 2350. 2630. 3090. 3010. 1550. 750. 3040. 1530. 3550. 1840. 1640. 1510. 2490. 1078. 3440. 1590. 2330. 1900. 798. 2950. 4490. 2020. 890. 590. 500. 1260. 1255. 3450. 860. 2140. 1442. 1110. 4560. 1820. 4010. 2594. 1960. 700. 1980. 2750. 2270. 2010. 2370. 2040. 2840. 1715. 2610. 1450. 1620. 2780. 1160. 3840. 2900. 1700. 3770. 2190. 600. 2870. 1440. 3150. 2790. 4210. 1040. 2910. 2060. 4083. 3180. 1746. 6530. 3240. 3000. 1890. 3060. 2250. 2650. 1990. 2520. 2726. 790. 1553. 2080. 3100. 3200. 3630. 2830. 1769. 2770. 3004. 4130. 2720. 4400. 2318. 765. 5844. 2890. 940. 5990. 1811. 5670. 1413. 2690. 2430. 2660. 3690. 3050. 3056. 5300. 2540. 3830. 2584. 410. 2380. 2810. 2000. 894. 4390. 3160. 3580. 2050. 4960. 3990. 6220. 710. 1752. 4190. 2800. 1689. 3540. 2510. 3220. 4800. 2970. 2530. 2940. 4610. 2930. 2670. 3500. 3595. 730. 4770. 2798. 620. 2580. 680. 4300. 4750. 809. 4200. 4260. 3530. 2180. 1363. 3140. 1864. 4420. 3860. 2880. 2850. 1714. 3130. 2460. 3880. 3640. 2245. 3620. 2506. 1536. 3855. 1528. 3170. 3310. 3420. 2313. 3750. 3120. 1206. 3870. 4170. 2587. 3430. 570. 4850. 1824. 2432. 1982. 3780. 3920. 4030. 1509. 2196. 660. 2557. 1313. 1122. 3900. 740. 2450. 4280. 3410. 4000. 1358. 3250. 4380. 4040. 3045. 2163. 1076. 4115. 1341. 3154. 5980. 1232. 2740. 5710. 3820. 2960. 3680. 4080. 3002. 2481. 630. 5320. 3190. 1489. 3070. 4290. 2456. 640. 1802. 540. 3020. 5370. 3597. 3266. 840. 3480. 3490. 4870. 4070. 2755. 3560. 1384. 1494. 3940. 3080. 3950. 1414. 4440. 650. 4100. 4225. 3320. 1584. 610. 4150. 2185. 1847. 1981. 4940. 2716. 1216. 3460. 3730. 1088. 3280. 3390. 1605. 3740. 2632. 4340. 1408. 4660. 1934. 2403. 4310. 998. 1736. 4120. 5450. 1131. 4180. 2024. 3350. 3260. 3720. 4460. 3850. 4740. 4160. 2734. 4140. 430. 3330. 4510. 1658. 1422. 4720. 2223. 2154. 7850. 2835. 2844. 2828. 3700. 1402. 2683. 4370. 1552. 3745. 1766. 2807. 1491. 470. 5610. 6090. 1785. 3590. 4500. 2075. 4820. 3930. 4475. 2538. 3270. 962. 1604. 4430. 2242. 6350. 1798. 8020. 3470. 370. 2542. 4240. 4760. 1068. 2334. 1767. 1239. 4410. 5760. 3905. 1252. 2452. 520. 4360. 1218. 3915. 1492. 2299. 3790. 2244. 1659.6 3202. 690. 1867. 2148. 1954. 2009. 5110. 1788. 1544. 2052. 5000. 4090. 1646. 4320. 3380. 1168. 3660. 2393. 1983. 4900. 3910. 2555. 4670. 2216. 3300. 2795. 2659. 2026. 6380. 3859. 5180. 1571. 1087. 2015. 2732. 2329. 2507. 5190. 2846. 2229. 6640. 2356. ] **************************************************************************************************** Unique Values of basement are [ 700. 0. 830. 230. 1020. 280. 430. 300. 650. 450. 600. 780. 900. 1280. 460. 1030. 400. 200. 360. 1520. 1180. 210. 1600. 740. 1360. 170. 720. 1830. 530. 840. 140. 710. 860. 1290. 240. 420. 1000. 80. 950. 790. 2500. 630. 1070. 340. 570. 920. 930. 1120. 290. 310. 190. 150. 670. 1460. 2196. 350. 580. 470. 440. 750. 1250. 1540. 490. 1800. 130. 640. 260. 800. 1110. 870. 500. 1060. 946. 540. 1200. 480. 330. 990. 1870. 2170. 730. 1090. 220. 380. 510. 370. 1410. 1430. 120. 560. 1320. 690. 940. 1050. 1240. 1950. 970. 850. 1330. 1300. 760. 1100. 1130. 770. 2220. 910. 1370. 270. 320. 1220. 550. 250. 680. 2580. 1170. 520. 1500. 1080. 2000. 620. 880. 660. 890. 1160. 90. 180. 590. 1040. 810. 960. 1510. 820. 1590. 265. 2020. 1420. 390. 2030. 610. 1700. 410. 1140. 1710. 1400. 1010. 1630. 1440. 1260. 1920. 160. 2150. 1190. 1660. 100. 894. 1780. 1850. 1840. 3500. 65. 50. 2040. 1580. 40. 1350. 1340. 1450. 2060. 70. 1150. 2080. 980. 110. 1930. 1230. 2130. 1270. 1913. 1640. 1390. 145. 2180. 1310. 1570. 1690. 2010. 1380. 1750. 2550. 1680. 792. 20. 2570. 508. 518. 2200. 1210. 1650. 515. 2190. 60. 1560. 143. 768. 1620. 415. 1530. 1852. 172. 1798. 2050. 1720. 1548. 506. 1550. 2390. 435. 1790. 1900. 10. 1760. 2330. 1816. 1024. 235. 1284. 1490. 556. 2310. 2600. 1480.] **************************************************************************************************** Unique Values of yr_built are [1956. 1978. 2001. 1908. 1953. 1920. 1996. 1998. 1947. 1954. 1910. 1993. 1906. 1941. 2013. 1967. 2003. 1974. 1979. 2006. 1995. 1939. 1916. 1970. 1955. 1945. 1997. 2004. 1980. 1976. 1977. 2000. 1962. 2005. 1987. 1989. 1937. 1994. 1981. 1986. 1963. 1948. 1949. 1902. 2002. 1946. 1944. 1985. 2011. 1999. 1912. 2008. 2009. 1973. 1983. 1972. 1961. 1911. 1984. 1951. 1942. 1922. 1950. 2007. 1988. 1957. 1926. 1930. 1964. 1952. 1928. 1905. 2015. 1966. 2014. 1990. 1904. 1992. 1975. 1914. 1969. 1991. 2012. 1925. 1968. 1959. 1913. 1909. 1940. 1965. 1915. 1943. 1917. 1900. 1927. 1924. 2010. 1971. 1960. 1958. 1921. 1903. 1982. 1931. 1934. 1936. 1929. 1901. 1918. 1923. 1932. 1938. 1935. 1933. 1919. 1907. 1949.4 1954.4] **************************************************************************************************** Unique Values of yr_renovated are [ 0. 1981. 1963. 1986. 1958. 2007. 2004. 2011. 1990. 2010. 2000. 2005. 1965. 2012. 1969. 2002. 2008. 1975. 2014. 1989. 1992. 1987. 1991. 1973. 1980. 1979. 1945. 1982. 2015. 1995. 1993. 1984. 2003. 2006. 1998. 1978. 1994. 1997. 2009. 1970. 2013. 2001. 1983. 1999. 1934. 1985. 1988. 1971. 1956. 1944. 1974. 1996. 1977. 1950. 1968. 1964. 1972.] **************************************************************************************************** Unique Values of lat are [47.7401 47.3468 47.4369 ... 47.7033 47.4601 47.7589] **************************************************************************************************** Unique Values of long are [-122.258 -122.091 -122.111 -122.268 -122.358 -122.365 -121.917 -122.025 -122.294 -122.333 -122.157 -122.396 -121.8 -122.357 -122.337 -122.081 -122.275 -122.121 -122.364 -122.124 -122. -122.023 -122.404 -122.391 -122.118 -122.306 -122.251 -122.105 -122.363 -122.043 -122.361 -122.283 -122.033 -122.322 -122.398 -122.335 -122.314 -122.162 -122.32 -122.097 -122.022 -122.289 -121.957 -122.04 -122.366 -122.311 -122.287 -122.028 -122.133 -122.193 -122.307 -122.001 -122.192 -122.324 -122.312 -122.245 -122.068 -122.303 -121.968 -122.308 -122.218 -122.151 -122.323 -122.002 -122.359 -122.305 -122.342 -122.059 -122.042 -122.174 -122.348 -122.189 -122.211 -122.378 -122.321 -122.102 -122.217 -122.145 -122.29 -121.789 -122.084 -122.27 -122.135 -122.205 -122.096 -122.386 -122.34 -122.261 -122.212 -122.099 -122.292 -122.037 -122.415 -122.159 -122.368 -122.114 -122.387 -122.373 -122.388 -122.195 -122.197 -122.216 -122.319 -122.014 -122.112 -122.28 -122.164 -122.263 -122.285 -122.352 -122.034 -122.318 -122.026 -122.126 -122.267 -122.327 -122.374 -122.38 -122.191 -121.783 -122.087 -122.19 -122.215 -122.128 -122.381 -122.334 -121.841 -122.17 -122.047 -122.004 -121.93 -122.375 -122.372 -122.301 -122.168 -122.226 -122.15 -122.194 -122.187 -122.238 -122.286 -122.177 -122.296 -122.298 -121.714 -122.224 -122.18 -122.377 -122.158 -122.017 -122.291 -122.332 -122.356 -121.995 -122.011 -122.397 -122.304 -122.392 -121.939 -122.094 -122.143 -122.139 -122.277 -122.389 -122.316 -122.049 -121.996 -122.166 -122.061 -122.173 -122.36 -122.282 -121.877 -122.103 -122.271 -122.08 -122.058 -122.183 -122.066 -122.225 -122.385 -122.236 -121.713 -122.223 -122.384 -122.329 -122.254 -122.3 -122.095 -122.395 -122.12 -122.309 -121.875 -122.257 -122.031 -121.932 -122.273 -122.018 -122.175 -122.072 -122.371 -122.032 -122.153 -122.39 -122.355 -122.041 -122.123 -122.154 -122.005 -122.315 -122.019 -122.382 -122.284 -122.221 -122.147 -122.107 -122.338 -122.039 -122.016 -122.161 -121.993 -122.288 -122.343 -122.264 -122.351 -122.146 -122.233 -122.085 -121.859 -122.383 -122.163 -122.069 -122.172 -122.262 -121.851 -121.994 -122.228 -122.122 -122.181 -121.998 -122.027 -122.117 -122.362 -122.045 -122.341 -122.393 -122.206 -122.207 -122.367 -122.119 -122.276 -122.082 -122.242 -122.265 -122.13 -122.188 -122.244 -122.046 -122.199 -122.299 -122.293 -122.079 -122.406 -122.26 -122.336 -122.24 -122.156 -122.21 -122.23 -122.235 -122.136 -122.297 -121.867 -122.349 -122.317 -122.279 -122.137 -122.108 -122.274 -122.295 -121.77 -121.989 -122.013 -122.062 -122.354 -122.37 -122.379 -122.142 -121.969 -122.256 -122.024 -122.325 -121.762 -121.985 -122.178 -122.409 -121.874 -122.149 -122.403 -122.345 -122.155 -122.331 -122.138 -122.171 -122.152 -122.196 -122.11 -122.03 -121.775 -122.41 -121.99 -122.339 -122.074 -122.02 -122.204 -121.868 -122.369 -122.134 -122.35 -122.078 -122.036 -122.353 -122.113 -122.255 -122.088 -122.179 -122.182 -122.203 -122.2 -121.321 -121.878 -122.185 -122.1 -121.773 -121.861 -122.086 -122.064 -122.184 -121.962 -122.259 -122.116 -122.115 -122.106 -122.465 -122.209 -121.912 -122.269 -122.33 -122.198 -122.057 -122.101 -121.96 -122.237 -121.876 -122.328 -121.774 -121.744 -122.132 -122.281 -122.131 -122.065 -122.16 -122.213 -122.22 -122.234 -122.2672 -122.229 -122.141 -122.038 -122.278 -122.313 -122.06 -122.243 -122.009 -122.248 -122.165 -122.015 -122.169 -122.048 -122.227 -122.272 -121.997 -121.927 -122.326 -122.044 -122.401 -122.003 -122.176 -121.723 -122.054 -122.31 -122.186 -121.972 -122.347 -122.144 -122.266 -122.202 -122.07 -122.076 -122.222 -122.089 -122.093 -122.219 -122.067 -122.376 -122.208 -122.214 -121.963 -121.75 -122.167 -122.029 -122.399 -121.869 -121.97 -121.965 -121.991 -122.129 -122.021 -122.01 -121.879 -122.394 -121.901 -121.882 -121.799 -122.083 -122.077 -122.302 -121.786 -122.063 -121.771 -122.231 -121.725 -122.109 -121.973 -121.914 -121.982 -121.984 -122.201 -122.249 -121.889 -122.47 -121.929 -122.443 -121.892 -122.14 -122.462 -122.098 -122.402 -121.974 -122.344 -122.052 -122.346 -121.999 -122.239 -122.44 -122.422 -121.959 -122.055 -122.073 -122.05 -121.977 -122.09 -121.966 -121.911 -122.125 -122.035 -122.148 -122.056 -121.854 -121.831 -122.4 -121.853 -122.519 -122.497 -121.988 -122.463 -122.232 -122.092 -121.781 -121.797 -122.071 -122.515 -121.746 -122.444 -122.405 -121.94 -121.803 -122.247 -121.881 -122.006 -122.053 -122.012 -121.933 -122.008 -121.883 -122.252 -121.887 -121.763 -121.747 -121.86 -121.953 -121.766 -121.888 -121.865 -122.127 -122.3018 -121.986 -121.757 -121.976 -121.733 -122.484 -122.104 -121.926 -121.955 -122.45 -121.792 -122.49 -122.051 -121.98 -121.764 -121.707 -121.802 -121.897 -122.412 -121.886 -121.898 -122.411 -121.752 -121.915 -121.862 -121.992 -121.913 -121.756 -121.983 -121.975 -121.899 -121.755 -121.872 -121.871 -121.73 -121.701 -121.787 -121.646 -121.852 -121.88 -121.967 -122.408 -121.971 -121.815 -121.979 -122.472 -122.446 -121.91 -121.85 -121.849 -122.253 -121.964 -122.007 -121.981 -122.246 -121.858 -121.907 -121.893 -121.922 -122.455 -122.504 -121.909 -121.958 -121.896 -121.949 -121.866 -122.461 -121.961 -121.721 -121.902 -121.89 -121.908 -121.936 -121.772 -122.43 -121.765 -122.413 -122.25 -121.676 -122.241 -121.364 -122.448 -121.788 -121.873 -122.449 -121.78 -122.407 -121.738 -121.737 -122.438 -122.2416 -122.431 -121.734 -121.906 -121.978 -121.769 -121.952 -122.034 -122.474 -121.921 -121.809 -122.1466 -121.826 -121.838 -121.779 -121.829 -121.761 -121.9 -121.87 -122.2118 -121.768 -121.885 -121.894 -121.916 -121.956 -121.856 -121.718 -122.425 -121.821 -121.937 -122.507 -121.904 -122.511 -122.2202 -121.943 -122.502 -122.42 -122.445 -122.503 ] **************************************************************************************************** Unique Values of living_measure15 are [1880. 2480. 3060. 1100. 1220. 1700. 2050. 2070. 1490. 1570. 1980. 1460. 1920. 1370. 1320. 2300. 1390. 1997. 1380. 2250. 2390. 1350. 1620. 980. 1820. 830. 1660. 1640. 1790. 1150. 1990. 2020. 2760. 1410. 1010. 1480. 2580. 2600. 2780. 1890. 2890. 1520. 2370. 1060. 2630. 2560. 2750. 1590. 2080. 2640. 1050. 2130. 1680. 1445. 4700. 1580. 930. 2500. 1955. 2460. 2920. 1910. 1290. 880. 3360. 1240. 2770. 2230. 1670. 1810. 1080. 2170. 1770. 2160. 2830. 1210. 4050. 1560. 1510. 3230. 1190. 1200. 2660. 2000. 2280. 2010. 1960. 1750. 1440. 1740. 1020. 3680. 3830. 4890. 2403. 1550. 1125. 780. 1420. 1630. 1760. 1860. 1780. 1140. 2820. 1720. 2100. 1120. 2440. 1800. 1530. 1270. 1330. 1360. 1600. 1110. 1280. 1470. 3030. 4120. 1870. 1321. 2360. 2550. 2029. 2670. 2790. 1940. 2430. 1260. 1650. 2260. 2800. 1300. 1160. 2810. 2610. 2200. 2190. 2040. 1340. 2140. 2120. 2290. 2530. 1180. 2090. 1730. 1400. 3290. 4560. 1430. 3190. 2270. 2400. 2910. 1950. 2590. 990. 1030. 2710. 2240. 2720. 2970. 1688. 2030. 3250. 2980. 1850. 2350. 1840. 1610. 3220. 1000. 2541.8 1310. 2310. 4362. 3040. 3130. 1512. 2740. 1450. 3100. 1970. 2410. 2470. 3730. 1500. 2434. 3370. 1710. 1830. 2960. 920. 3270. 1438. 3150. 2510. 1078. 870. 998. 3300. 1540. 740. 3160. 2210. 3050. 4070. 1690. 1250. 2380. 2950. 2420. 3850. 2981. 2220. 1802. 2690. 2340. 2450. 2870. 2238. 3590. 2570. 3000. 3500. 1930. 3570. 2540. 3380. 1090. 2005. 960. 3970. 2060. 1900. 4630. 2330. 2180. 820. 3740. 4290. 2520. 2594. 2840. 3180. 2402. 3140. 3310. 1369. 3020. 1528. 3470. 2320. 1384. 2900. 1509. 1934. 1494. 2880. 1230. 4440. 2730. 2502. 1404. 2665. 3280. 3240. 4240. 1902. 3413. 2415. 1170. 2850. 3620. 3430. 1811. 4100. 2110. 1630. 1763. 2620. 3600. 2597. 2650. 2849. 1130. 940. 2767. 840. 1263.6 3330. 1716. 3090. 3340. 3450. 1131. 3420. 4800. 2680. 1684. 1459. 3540. 3560. 4750. 2083. 2700. 4170. 3625. 4520. 3400. 4650. 3990. 3070. 3490. 1070. 1277. 2648. 3920. 3110. 2150. 860. 3200. 3800. 3760. 1714. 3750. 4913. 1040. 2054. 3010. 2441. 1892. 2940. 1365. 890. 1488. 3770. 1941. 2323. 3120. 3890. 850. 1032. 2996. 1528. 2990. 2409. 2912. 1544. 3460. 3480. 4920. 3720. 3056. 4090. 950. 2490. 2860. 3440. 4410. 3860. 2106. 3790. 952. 900. 3210. 1564. 5030. 1332. 2930. 1358. 5170. 2901. 1429. 1463. 3045. 1624.6 1076. 4350. 3950. 1898. 5790. 1884. 1232. 3550. 4270. 4060. 3510. 3650. 2566. 2363. 4640. 2246. 1584. 1168. 2019. 2228. 3170. 1162. 4000. 2478. 690. 1276. 3193. 3087. 3260. 4480. 1438. 5000. 2234. 2136. 2488. 3880. 1414. 4040. 3710. 1452. 1492. 2297. 1767. 1981. 2255. 3568. 1088. 4460. 1458. 3910. 970. 3557. 1408. 670. 4340. 1919. 4330. 1668. 2389. 3810. 1984. 4210. 2076. 1746. 3640. 2254. 5330. 3690. 3670. 1658. 3840. 1677. 2154. 1497.6 6210. 2296. 2406. 2273. 3074. 2513. 3580. 2683. 2605. 3390. 3715. 3350. 3700. 2978. 2028. 3410. 1447. 1768. 1691. 3870. 2165. 4590. 4760. 1784. 4010. 1356. 2009. 1574. 2755. 1894. 4160. 2419. 3900. 4080. 4190. 1949. 1815. 1938. 2242. 3930. 800. 3530. 1536. 2077. 1998.4 4280. 4550. 2425. 1217. 3660. 2616. 910. 1468. 1798. 4110. 2052. 4180. 4300. 4320. 1646. 806. 2049. 1639. 1654. 1813. 2315. 3610. 3515. 4850. 2622. 2095. 1571. 1518. 460. 3780. 2439. 2875. 3940. 3080. 4370. 2844. 1342. 1975. 3721. 2216. 4260. 3520. 2848. 760. 3736. 1137. 3078. 2955. 2619. 1686. 770. 3402. 5080. 5380. 2738. 4620. 3056. 3980. 5500. 4470. 2258. ] **************************************************************************************************** Unique Values of lot_measure15 are [ 12000. 40500. 6567. ... 8683. 6495. 205603.] **************************************************************************************************** Unique Values of log_living_measure are [7.57558465 7.75790621 8.11969625 7.7450028 6.68461173 7.18538702 8.18868912 7.3395377 7.02997291 7.79564654 7.12286666 7.56008047 7.25134498 6.85646198 7.23705903 7.53369371 7.19293422 7.7363071 7.7406644 7.29979737 7.15461536 7.80791663 7.36518013 7.42057891 6.65929392 7.53902706 7.38398946 7.17011954 7.50659178 7.61085279 7.84776254 7.57044325 7.63046126 6.7214257 7.66387726 7.81197343 7.78738203 7.42654907 7.54433211 7.85166118 7.56527528 6.97541393 7.08170859 8.01961279 8.09864284 6.91770561 7.46164039 7.64491934 6.69703425 7.47306909 7.7664169 7.37775891 7.41457288 8.42726848 7.33302301 7.77904864 7.59588992 7.89357207 8.12266802 7.58069975 8.0163179 7.11476945 6.77992191 7.3065314 8.54090972 7.74932246 7.70074779 7.79975332 8.00302867 7.46737107 7.52833177 7.40245152 7.52294092 7.40853057 6.95654544 7.48436864 7.79152282 8.40514369 6.57925121 8.278936 8.11372609 7.51752085 7.84384864 8.8261474 7.60090246 7.8709296 7.43838353 7.71423114 7.39633529 7.55485852 6.96602419 7.65917137 7.67322312 8.01301211 8.53503311 6.99393298 7.6778635 8.70863966 7.43248381 7.9827577 7.04751722 7.85864066 7.70975686 7.31986493 6.63331843 7.07326972 7.3588309 7.32646561 7.63530389 7.62070509 7.03878354 8.23217424 7.45007957 7.58578882 7.17778242 7.4899709 7.83201418 7.51207125 7.61579107 6.73340189 7.09007684 7.34601021 7.09837564 8.18032087 7.68248245 8.32360844 7.27239839 7.21523998 7.72312009 7.61529834 7.99294455 8.6053872 7.90838716 7.90100705 7.94449216 8.19146305 7.13089883 7.05617528 8.04237801 7.41276402 7.91205689 7.25841215 7.50108212 7.99967858 7.82404601 7.78322402 7.92298596 6.82437367 7.88983375 7.9373747 8.13446757 7.59085212 7.29301768 7.00306546 8.24538447 8.49902922 8.07402622 7.69621264 7.64969262 8.61068353 8.28652137 7.138867 8.14902387 8.24275635 6.8134446 7.20042489 7.90470391 7.1623975 8.06463648 8.16337132 7.65444323 8.13153071 7.3524411 7.86326672 7.73193072 7.70526247 7.31322039 7.24422752 8.16621627 6.64639051 6.80239476 7.81601384 7.49554194 7.47873483 8.33519158 8.21878716 8.20794694 6.92755791 6.86693328 7.22256602 7.22983878 7.9793389 7.68708016 6.87729607 7.64012317 7.94093976 7.26542972 8.20248245 7.82204401 7.45587669 7.47533924 7.85941315 6.90775528 7.8038433 8.41183268 8.03592637 8.00969536 6.62007321 6.89770494 8.37562963 8.17470288 6.88755257 7.14677218 7.54960917 7.06475903 7.82003799 6.98286275 8.1067325 7.95155933 6.90575328 8.05197808 8.10772006 7.9585769 8.40960798 7.96901178 6.50727771 6.2146081 8.21066803 7.13489085 8.45318786 7.10660614 7.66856111 8.00636757 8.26873183 7.01211529 8.4250779 8.29654652 8.2839993 7.86095636 8.08948247 6.55108034 7.9157132 7.91935619 7.72753511 8.0677762 8.07090609 8.03268488 7.69165682 8.09254526 6.93731408 7.77064523 7.9266026 7.3714893 7.39018143 7.44716836 7.8671055 7.7186855 7.28619171 8.39840966 8.15190987 6.83518459 7.93020621 7.93379687 7.77485577 7.85554468 7.20785987 8.25322765 7.97246602 7.02108396 7.83991936 8.23483028 6.39692966 6.79122146 8.05515773 8.34521793 7.62559507 7.97590836 6.70930434 7.8785342 7.89729647 8.31458729 8.16905315 7.59387784 6.74523635 7.76217061 8.78416222 8.08332861 7.60589 8.02617019 7.99631723 8.02943284 6.94697599 7.88231492 8.29154651 7.75362355 8.06337782 7.94803199 7.34794382 8.03915739 8.19698793 6.67203295 7.47816969 7.82803803 8.00770001 8.32603269 8.34995727 8.38935982 8.14612951 8.11671562 8.20521843 8.08023742 8.67317077 8.27129265 8.98844604 7.50163446 8.6429444 6.75693239 7.87473913 7.25347038 8.14031554 7.8860814 8.25582843 7.27931884 8.21338174 8.02289687 8.02486215 6.3630281 8.5754621 6.84587988 8.30893825 8.25062008 7.85709386 6.38012254 6.01615716 6.79570578 8.65869275 7.44424865 8.05832731 8.18311808 8.15478757 8.80537514 8.88461023 6.56526497 8.26616444 7.46851327 8.52714352 8.34045601 7.96206731 7.43189192 8.06148687 7.96554557 8.17188201 8.07713664 8.4763712 7.95507427 7.98616486 8.73873546 8.16051825 8.18729927 8.47010158 7.93666016 6.42971948 8.3663703 8.4658999 6.69579892 8.42068229 8.35702444 8.15765702 7.83597458 8.38022734 8.30399997 8.1942293 7.72179178 8.36869318 8.2480057 7.53048 8.39389498 8.25842246 7.4465851 8.04878828 8.85366543 8.26359043 8.19973896 7.7164608 7.98956045 7.82644314 7.37838371 8.44354665 7.33171497 7.75600466 8.10470347 8.13739583 8.45531779 7.74630066 8.22951112 8.04558828 8.26100979 8.33567131 8.07620453 6.34563636 8.48673398 7.50878717 7.79646924 8.44677073 7.59186171 8.23747929 8.27384693 8.30152165 6.98471632 7.31920246 7.6943928 8.39162997 7.84658998 7.18006987 8.28147086 7.02286809 8.384804 8.14322675 8.69282576 6.60665019 8.45105339 8.55256034 8.29404964 8.43598314 7.21376831 8.08641028 8.2160881 8.02125618 7.67925143 6.98100574 8.65521449 8.32239411 7.40913644 8.53050421 8.93853165 7.11639414 8.51719319 8.6499743 8.50106381 8.50916102 8.31385227 6.59304453 8.00703401 7.81641698 7.2758646 8.57922858 8.0955987 7.30586003 8.2763947 8.22416351 8.36404201 7.80628929 6.46146818 7.49665244 6.29156914 6.5220928 8.12563099 8.58858319 8.18785544 8.41626727 8.09132127 8.29903718 8.18590748 8.49084922 8.31139828 7.92117272 8.32117831 8.17751582 7.23273314 8.1285852 7.30921237 8.77801781 8.45744319 6.76849321 8.48052921 7.25417785 8.70450229 6.47697236 8.56978564 8.31874225 8.34877454 7.36770857 6.41345896 8.35936911 8.33086361 8.31630025 7.68937111 7.52131798 8.41847722 8.54675199 7.59135705 8.3428398 8.55833513 7.90691549 7.60489448 8.40290405 6.99209643 8.11072758 8.55448898 7.60339934 8.58298093 7.87549929 6.49223984 7.24992554 7.28482091 7.56734568 8.41405243 7.78447324 7.32383057 8.43814998 8.60337089 7.03085748 8.33806653 7.82963039 8.44246965 8.62155321 8.17611034 8.22147895 8.46379241 8.33327035 7.91352102 8.32845107 8.22684089 6.06378521 7.41336734 8.37332282 7.25981961 8.61974978 8.45956408 7.70661291 7.67508186 8.96826881 8.10167775 7.94979722 7.95296679 7.94732503 7.89469085 8.38251829 7.86518795 7.3472997 8.57640505 7.55433482 7.93987158 7.72267752 6.15273269 8.76561455 8.95027347 7.48717369 8.4316353 7.28961052 8.49494758 7.63771643 8.83927669 8.40626163 7.83913165 8.23906533 7.48211892 8.66388757 7.7151236 8.50309427 8.80086724 8.1875774 8.9896937 5.91350301 7.84070645 8.35231855 8.73391617 8.38708451 6.97354302 8.57734711 7.47703847 7.12205988 8.86502919 8.42836198 8.37793112 8.46800295 7.8046593 6.25382881 8.53699582 8.27257061 8.2550489 8.2401213 7.37400186 7.71601527 8.42288251 8.07153089 6.5366916 7.53208814 7.57763383 7.60539236 8.53895468 7.48885296 7.75961415 7.62657021 8.65172408 8.90788301 8.632306 7.40610338 7.06902343 8.37101068 8.67248608 8.30647216 7.78030309 7.59236613 8.63763934 8.42945428 8.34759041 7.8458075 8.36170829 8.43381158 8.44891435 7.70345905 8.35467426 7.93558739 7.61381868 8.76092338 8.49351506 8.4446225 7.35946764 7.18690102 7.60837447 7.91278922 7.75319427 8.66112036 8.10319175 7.8268421 8.13211877 8.58485184 8.90245559 8.46168048 7.70930833 8.08271113 8.2890371 8.63408694 8.38617293] **************************************************************************************************** Unique Values of log_lot_measure are [ 9.41246456 10.86421613 8.94702566 ... 11.31846655 8.70499968 8.92199141] **************************************************************************************************** Unique Values of log_ceil_measure are [7.13089883 7.75790621 8.11969625 7.29979737 6.68461173 6.99393298 8.18868912 7.3395377 7.02997291 7.25134498 6.86693328 7.56008047 6.85646198 7.57558465 7.23705903 7.53369371 6.80239476 7.7363071 7.7406644 6.92755791 7.15461536 7.50108212 6.97541393 6.65929392 7.00306546 7.38398946 7.17011954 7.02108396 6.98471632 7.57044325 6.93731408 6.7214257 7.66387726 7.3588309 7.78738203 7.42654907 7.3065314 7.85166118 7.56527528 6.76849321 7.64969262 7.08170859 7.32646561 8.09864284 6.91770561 7.46164039 7.64491934 6.69703425 7.47306909 7.07326972 7.37775891 7.53902706 7.41457288 8.42726848 7.77904864 7.48436864 7.89357207 7.47873483 7.10660614 7.42057891 6.96602419 6.77992191 6.64639051 7.74932246 7.70074779 7.55485852 8.00302867 7.1623975 7.03878354 7.52294092 7.40853057 6.74523635 7.79152282 8.40514369 6.57925121 7.19293422 8.08023742 8.11372609 7.44424865 7.84384864 8.71768205 7.7450028 7.8709296 7.28619171 6.88755257 6.90775528 7.06475903 7.67322312 7.70975686 8.53503311 7.6778635 8.17611034 7.52833177 7.04751722 7.85864066 7.11476945 7.20042489 6.63331843 6.82437367 7.8785342 7.45007957 7.22256602 7.24422752 7.17778242 7.4899709 7.43248381 7.51207125 6.73340189 7.09007684 7.61579107 7.9585769 7.68248245 8.0163179 8.28652137 6.89770494 7.21523998 7.72312009 7.61529834 7.79975332 7.31322039 8.09101504 6.8134446 7.12286666 7.90100705 7.14677218 7.94449216 8.19146305 6.3630281 8.04237801 7.41276402 7.91205689 7.99967858 7.82404601 7.71423114 7.36518013 7.78322402 7.92298596 7.63530389 7.25841215 7.7664169 7.3524411 8.24538447 6.70930434 7.20785987 8.07402622 7.18538702 7.69621264 7.81197343 8.61068353 8.18032087 7.65917137 7.84776254 8.24275635 7.39633529 7.09837564 6.50727771 7.90470391 8.16337132 7.65444323 8.13153071 6.95654544 7.26542972 7.86326672 7.73193072 7.70526247 6.83518459 8.16621627 7.81601384 7.49554194 8.07775756 8.21878716 8.20794694 7.46737107 7.22983878 7.9793389 6.87729607 7.58578882 8.20248245 7.29301768 7.82204401 7.45587669 7.47533924 8.12266802 7.85941315 7.76217061 7.87473913 8.03592637 8.00969536 7.34601021 6.62007321 8.01961279 7.33302301 8.17470288 7.51752085 7.40245152 7.31986493 7.82003799 6.98286275 8.14322675 7.3714893 7.75362355 7.54960917 6.6821086 7.98956045 8.40960798 7.61085279 6.79122146 6.38012254 6.2146081 7.138867 7.13489085 8.14612951 6.75693239 7.66856111 7.01211529 8.4250779 7.50659178 8.29654652 7.86095636 7.58069975 6.55108034 7.59085212 7.91935619 7.72753511 7.60589 7.77064523 7.62070509 7.95155933 7.44716836 7.8671055 7.27931884 7.39018143 7.93020621 7.05617528 8.25322765 7.97246602 7.43838353 8.23483028 7.69165682 6.39692966 7.96206731 7.27239839 8.05515773 7.93379687 8.34521793 6.94697599 7.97590836 7.63046126 8.31458729 8.06463648 7.46508274 8.78416222 8.08332861 8.00636757 7.54433211 8.02617019 7.7186855 7.88231492 7.59588992 7.83201418 7.91059061 6.67203295 7.34794382 7.64012317 8.03915739 8.07090609 8.19698793 7.94803199 7.47816969 7.9266026 8.00770001 8.32603269 7.90838716 8.38935982 6.63987583 8.67317077 7.96901178 6.84587988 8.69784669 7.50163446 8.6429444 7.25347038 7.89729647 7.79564654 7.8860814 8.21338174 8.02289687 8.02486215 8.5754621 7.83991936 8.25062008 7.85709386 6.01615716 7.77485577 7.94093976 7.60090246 6.79570578 8.38708451 8.05832731 8.18311808 7.62559507 8.50916102 8.29154651 8.73552519 6.56526497 7.46851327 8.34045601 7.9373747 7.43189192 8.17188201 7.82803803 8.07713664 8.4763712 7.99631723 7.83597458 7.98616486 8.43598314 7.9827577 7.88983375 8.16051825 8.18729927 6.59304453 8.47010158 7.93666016 6.42971948 7.85554468 6.5220928 8.3663703 8.4658999 6.69579892 8.3428398 8.35702444 8.16905315 7.68708016 7.21744343 8.05197808 7.53048 8.39389498 8.25842246 7.96554557 7.95507427 7.4465851 8.04878828 7.80791663 8.26359043 8.19973896 7.7164608 8.1942293 7.82644314 7.33693691 8.25712629 7.33171497 8.06148687 8.10470347 8.13739583 7.74630066 8.22951112 8.04558828 8.26100979 8.33567131 7.85825418 8.14031554 6.34563636 8.48673398 7.50878717 7.79646924 7.59186171 8.23747929 8.27384693 8.30152165 7.31920246 7.6943928 6.49223984 7.84658998 7.18006987 7.02286809 8.26873183 6.60665019 7.8038433 8.36170829 8.13446757 8.29404964 7.21376831 8.08641028 8.384804 8.30399997 8.02125618 7.67925143 6.98100574 8.32239411 7.20117088 8.05642677 8.69617585 7.11639414 7.9157132 8.6499743 8.2480057 7.99294455 8.21066803 8.31385227 8.00703401 7.81641698 6.44571982 8.57922858 8.0677762 7.30586003 8.02943284 8.36404201 7.80628929 6.46146818 7.49665244 6.29156914 8.01301211 8.58858319 8.18785544 8.09132127 8.15478757 8.15765702 8.49084922 8.31139828 7.92117272 8.17751582 7.23273314 7.30921237 8.278936 8.03268488 8.28147086 7.25417785 8.39840966 6.47697236 8.31874225 8.34877454 8.10772006 7.36770857 6.41345896 8.33086361 7.68937111 7.52131798 7.59135705 8.50512061 7.90691549 7.10332206 8.14902387 8.22416351 6.99209643 8.0955987 8.1285852 7.38087904 8.22684089 7.87549929 8.37562963 7.24992554 8.44677073 7.56734568 7.78447324 8.36869318 6.90575328 8.32360844 8.60337089 7.03085748 8.33806653 7.61283103 8.11671562 8.08948247 8.22147895 8.40290405 8.25582843 8.46379241 8.33327035 7.91352102 8.32845107 6.06378521 8.11072758 8.41405243 7.41336734 7.25981961 8.45956408 7.70661291 7.67508186 8.96826881 7.94979722 7.95296679 7.94732503 8.2160881 7.89469085 8.38251829 7.3472997 8.2281769 7.47647238 7.93987158 7.30720231 6.15273269 8.632306 8.71440336 7.48717369 8.18590748 8.41183268 7.63771643 8.48052921 8.2763947 8.40626163 7.83913165 8.09254526 7.38025579 8.39615486 7.7151236 8.75621009 7.49443022 8.9896937 8.15190987 5.91350301 7.84070645 8.35231855 8.46800295 6.97354302 7.47703847 7.12205988 8.39162997 8.65869275 8.27001306 7.8046593 6.25382881 8.38022734 8.27257061 7.74022952 8.2401213 7.71601527 8.07153089 6.5366916 7.53208814 7.57763383 7.60539236 8.53895468 7.48885296 7.34213173 7.62657021 8.51719319 8.31630025 7.40610338 8.37101068 8.12563099 8.20521843 7.78030309 7.59236613 8.49699048 8.27129265 7.8458075 8.44891435 7.70345905 8.10167775 7.93558739 7.88570539 7.61381868 8.76092338 8.25816336 8.55256034 7.35946764 6.99117689 7.60837447 7.91278922 7.75319427 7.8268421 8.55448898 7.95366978 7.70930833 8.80086724 7.42535789 7.76472054] **************************************************************************************************** Unique Values of log_basement are [6.55250789 0. 6.72262979 5.44241771 6.92853782 5.63835467 6.06610809 5.70711026 6.47850964 6.11146734 6.39859493 6.66057515 6.80350526 7.1553963 6.13339804 6.93828448 5.99396143 5.30330491 5.88887796 7.32712329 7.07411682 5.35185813 7.37838371 6.60800063 7.215975 5.14166356 6.58063914 7.51261754 6.27476202 6.73459166 4.94875989 6.56667243 6.7580945 7.16317239 5.48479693 6.04263283 6.90875478 4.39444915 6.85751406 6.67329797 7.82444593 6.44730586 6.97634807 5.83188248 6.34738921 6.82546004 6.83625928 7.02197642 5.67332327 5.73979291 5.25227343 5.01727984 6.50876914 7.28687641 7.69484807 5.86078622 6.36475076 6.15485809 6.08904488 6.62140565 7.13169851 7.34018684 6.19644413 7.49609735 4.87519732 6.46302946 5.56452041 6.68586095 7.01301579 6.76964198 6.2166061 6.96696714 6.85329909 6.29341928 7.09090982 6.17586727 5.80211838 6.89871453 7.53422833 7.68294317 6.59441346 6.99484999 5.3981627 5.94279938 6.23636959 5.91620206 7.25205395 7.26612878 4.79579055 6.32972091 7.1861443 6.53813982 6.84694314 6.95749737 7.12367279 7.57609734 6.87832647 6.74641213 7.19368582 7.17088848 6.63463336 7.00397414 7.03085748 6.64768837 7.70571282 6.8145429 7.22329568 5.60211882 5.77144112 7.10742547 6.31173481 5.52545294 6.52356231 7.8559322 7.06561336 6.25575004 7.31388683 6.98564182 7.60140233 6.43133108 6.78105763 6.49375384 6.79234443 7.05703698 4.51085951 5.19849703 6.38181602 6.94793707 6.69826805 6.86797441 7.32052696 6.71052311 7.37211803 5.58349631 7.61134772 7.25911613 5.96870756 7.61628356 6.41509696 7.43897159 6.01859321 7.03966035 7.44483327 7.24494155 6.91869522 7.3969486 7.2730926 7.13966034 7.56060116 5.08140436 7.67368813 7.08254857 7.41517511 4.61512052 6.79682372 7.48493028 7.52348131 7.51806418 8.16080392 4.18965474 3.93182563 7.62119516 7.36581284 3.71357207 7.20860034 7.20117088 7.28000825 7.63094658 4.26267988 7.04838641 7.64060383 6.88857246 4.7095302 7.56579328 7.11558213 7.66434663 7.14755927 7.55695057 7.40306109 7.23777819 4.98360662 7.68753877 7.17854548 7.35946764 7.43307535 7.60638739 7.23056315 7.46794233 7.84424072 7.42714413 6.67582322 3.04452244 7.85205021 6.23244802 6.25190388 7.69666708 7.09920174 7.40913644 6.24610677 7.69211334 4.11087386 7.35308192 4.9698133 6.64509097 7.39079852 6.03068526 7.3336764 7.52456123 5.15329159 7.49498623 7.62608276 7.4506608 7.34536484 6.228511 7.34665516 7.77946697 6.07764224 7.4905294 7.55013534 2.39789527 7.47363711 7.75405264 7.50494207 6.93244789 5.46383181 7.158514 7.30720231 6.32256524 7.74543561 7.86365127 7.30047281] **************************************************************************************************** Unique Values of log_living_measure15 are [7.53902706 7.81601384 8.02617019 7.00306546 7.10660614 7.43838353 7.62559507 7.63530389 7.3065314 7.3588309 7.59085212 7.28619171 7.56008047 7.22256602 7.18538702 7.7406644 7.23705903 7.59390463 7.22983878 7.7186855 7.77904864 7.20785987 7.39018143 6.88755257 7.50659178 6.7214257 7.41457288 7.40245152 7.4899709 7.04751722 7.59588992 7.61085279 7.92298596 7.25134498 6.91770561 7.29979737 7.85554468 7.86326672 7.93020621 7.54433211 7.96901178 7.32646561 7.77064523 6.96602419 7.87473913 7.84776254 7.91935619 7.3714893 7.64012317 7.8785342 6.95654544 7.66387726 7.42654907 7.2758646 8.45531779 7.36518013 6.83518459 7.82404601 7.57814547 7.80791663 7.9793389 7.55485852 7.1623975 6.77992191 8.11969625 7.12286666 7.9266026 7.70975686 7.42057891 7.50108212 6.98471632 7.68248245 7.47873483 7.6778635 7.94803199 7.09837564 8.30647216 7.3524411 7.31986493 8.08023742 7.08170859 7.09007684 7.8860814 7.60090246 7.73193072 7.60589 7.58069975 7.46737107 7.27239839 7.46164039 6.92755791 8.21066803 8.25062008 8.49494758 7.78447324 7.34601021 7.02553831 6.65929392 7.25841215 7.39633529 7.47306909 7.52833177 7.48436864 7.03878354 7.94449216 7.45007957 7.64969262 7.02108396 7.79975332 7.49554194 7.33302301 7.14677218 7.19293422 7.21523998 7.37775891 7.01211529 7.15461536 7.29301768 8.0163179 8.32360844 7.53369371 7.1861443 7.7664169 7.84384864 7.61529834 7.88983375 7.93379687 7.57044325 7.79564654 7.138867 7.40853057 7.72312009 7.9373747 7.17011954 7.05617528 7.94093976 7.8671055 7.69621264 7.69165682 7.62070509 7.20042489 7.66856111 7.65917137 7.7363071 7.83597458 7.07326972 7.64491934 7.45587669 7.24422752 8.09864284 8.4250779 7.26542972 8.0677762 7.72753511 7.78322402 7.97590836 7.57558465 7.85941315 6.89770494 6.93731408 7.90470391 7.71423114 7.90838716 7.99631723 7.42903969 7.61579107 8.08641028 7.99967858 7.52294092 7.76217061 7.51752085 7.38398946 8.07713664 6.90775528 7.83995501 7.17778242 7.7450028 8.38068595 8.01961279 8.04878828 7.30304792 7.9157132 7.27931884 8.03915739 7.58578882 7.78738203 7.81197343 8.22416351 7.31322039 7.79729127 8.12266802 7.44424865 7.51207125 7.99294455 6.82437367 8.09254526 7.25943443 7.31334971 8.05515773 7.82803803 6.98286275 6.76849321 6.90575328 8.10167775 7.3395377 6.60665019 8.05832731 7.70074779 8.02289687 8.31139828 7.43248381 7.13089883 7.77485577 7.98956045 7.79152282 8.25582843 8.00001409 7.70526247 7.49665244 7.89729647 7.75790621 7.8038433 7.96206731 7.71333789 8.18590748 7.85166118 8.00636757 8.16051825 7.56527528 8.18032087 7.83991936 8.12563099 6.99393298 7.60339934 6.86693328 8.28652137 7.63046126 7.54960917 8.44031215 7.75362355 7.68708016 6.70930434 8.22684089 8.36404201 7.83201418 7.86095636 7.95155933 8.06463648 7.77419364 8.05197808 8.10470347 7.22183583 8.01301211 7.33171497 8.15190987 7.74932246 7.20612685 7.97246602 7.31920246 7.56734568 7.30921237 7.96554557 7.11476945 8.39840966 7.91205689 7.82484569 7.24708058 7.88795934 8.0955987 8.08332861 8.35231855 7.49724139 8.13534695 7.78945457 7.06475903 7.95507427 8.1942293 8.14031554 7.50163446 8.31874225 7.65444323 7.38832147 7.47477218 7.8709296 8.18868912 7.86211221 7.88231492 7.95472333 7.02997291 6.84587988 7.92551898 6.73340189 7.13449894 8.11072758 7.44775128 8.03592637 8.11372609 8.14612951 7.03085748 8.13739583 8.4763712 7.89357207 7.41224049 7.28550655 8.17188201 8.17751582 8.4658999 7.64156444 7.90100705 8.33567131 8.19560957 8.41626727 8.13153071 8.4446225 8.29154651 8.02943284 8.15765702 6.97541393 7.15226886 7.88155992 8.27384693 8.04237801 7.67322312 6.75693239 8.07090609 8.24275635 8.23217424 7.4465851 8.22951112 8.49964003 6.94697599 7.62754439 8.00969536 7.80016307 7.53961977 7.98616486 7.21890971 6.79122146 7.28680958 8.23483028 7.57095858 7.75061473 8.04558828 8.26616444 6.74523635 6.92430723 8.00503334 7.32825682 8.00302867 7.786967 7.97659541 7.34213173 8.14902387 8.15478757 8.50106381 8.22147895 8.0213387 8.31630025 6.85646198 7.82003799 7.9585769 8.14322675 8.39162997 8.25842246 7.65254569 8.2401213 6.85856503 6.80239476 8.07402622 7.35500192 8.52317526 7.18959654 7.9827577 7.21376831 8.55062797 7.97281078 7.26473018 7.2882444 8.02125618 7.39041018 6.98100574 8.37793112 8.28147086 7.53868573 8.66388757 7.54115246 7.11639414 8.17470288 8.35936911 8.30893825 8.16337132 8.20248245 7.85010355 7.76768728 8.44246965 7.71690614 7.36770857 7.06304816 7.61035762 7.70342805 8.06148687 7.05789794 8.29404964 7.81520706 6.5366916 7.12177276 8.06871619 8.03495502 8.08948247 8.40737833 7.25882438 8.51719319 7.71154898 7.6666902 7.81923445 8.26359043 7.25417785 8.30399997 8.21878716 7.26410274 7.30787278 7.3660982 7.7393592 7.47703847 7.59135705 7.72090525 8.17976049 6.99209643 8.40290405 7.38196774 7.28482091 8.27129265 6.87729607 8.17667277 7.24992554 6.50727771 8.37562963 7.5595595 8.37332282 7.39430358 7.77863015 8.24538447 7.59287029 8.34521793 7.63819824 7.46508274 8.19973896 7.71630888 8.58110652 8.21338174 8.20794694 7.41336734 8.25322765 7.42476176 7.67508186 7.28694766 8.73391617 7.71717202 7.7857209 7.72885582 8.03022955 7.82923254 8.18311808 7.89469085 7.86518795 8.1285852 8.22013396 8.11671562 8.2160881 7.99499477 7.15966032 7.61480536 8.13446757 7.27480358 7.47760424 7.43307535 8.26100979 7.68017564 8.4316353 8.46800295 7.47981356 8.29654652 7.21229447 7.60539236 7.34273993 7.92117272 7.54644627 8.33327035 7.79110951 8.26873183 8.31385227 8.34045601 7.5750717 7.50384075 7.54968836 7.26491969 7.7151236 8.2763947 7.97621748 6.68461173 8.16905315 7.33693691 7.63867982 7.57931255 8.36170829 8.42288251 7.7935868 7.10414409 8.20521843 7.86940171 6.8134446 7.29165621 7.37390758 7.49443022 8.32117831 7.62657021 8.33806653 8.3663703 8.37101068 7.33553668 6.69208374 7.47473063 7.62510715 7.40184158 7.41095188 7.50273821 7.74716497 8.17915325 8.1647948 8.48673398 7.87169266 7.64730883 7.35946764 7.32514896 6.13122649 8.23747929 7.7993434 7.96380795 8.278936 8.03268488 8.38251829 7.95296679 7.18437142 8.19146305 7.58832368 8.22174773 7.70345905 8.35702444 8.16621627 7.95437227 6.63331843 8.2257708 7.81504466 7.03614849 8.03203531 7.99125393 7.87054784 7.41720688 6.64639051 8.13211877 8.53306654 8.59044365 7.91498301 8.43814998 8.02486215 8.2890371 8.61250337 8.40514369 7.72223474] **************************************************************************************************** Unique Values of log_lot_measure15 are [ 9.39266193 10.60905725 8.78981239 ... 9.06912237 8.77878793 12.2337024 ] **************************************************************************************************** Unique Values of ceil_1.5 are [0. 1. 0.6 0.4 0.2] **************************************************************************************************** Unique Values of ceil_2.0 are [0. 1. 0.4 0.6 0.8] **************************************************************************************************** Unique Values of ceil_2.5 are [0. 1.] **************************************************************************************************** Unique Values of ceil_3.0 are [0. 1.] **************************************************************************************************** Unique Values of ceil_3.5 are [0. 1.] **************************************************************************************************** Unique Values of coast_1.0 are [0. 1.] **************************************************************************************************** Unique Values of furnished_1.0 are [0. 1. 0.2] **************************************************************************************************** Unique Values of year_sold_2015 are [0. 1.] **************************************************************************************************** Unique Values of warm_month_sold_1.0 are [1. 0.] **************************************************************************************************** Unique Values of zip_price_cat_medium_price are [1. 0.] **************************************************************************************************** Unique Values of zip_price_cat_high_price are [0. 1.] **************************************************************************************************** Unique Values of basement_category_Small Basement are [0. 1.] **************************************************************************************************** Unique Values of basement_category_Large Basement are [1. 0.] ****************************************************************************************************
# Creating interaction terms for training and testing datasets
# interacting furnished with ceil_measure
x_train_im['furnished_log_ceil_measure'] = x_train_im['furnished_1.0']*x_train_im['log_ceil_measure']
x_test_im['furnished_log_ceil_measure'] = x_test_im['furnished_1.0']*x_test_im['log_ceil_measure']
# interacting furnished with living_measure
x_train_im['furnished_log_living_measure'] = x_train_im['furnished_1.0']*x_train_im['log_living_measure']
x_test_im['furnished_log_living_measure'] = x_test_im['furnished_1.0']*x_test_im['log_living_measure']
# interacting furnished with living_measure15
x_train_im['furnished_log_living_measure15'] = x_train_im['furnished_1.0']*x_train_im['log_living_measure15']
x_test_im['furnished_log_living_measure15'] = x_test_im['furnished_1.0']*x_test_im['log_living_measure15']
# interacting ceil with ceil_measure
for i in df['ceil'].dropna().unique().to_list():
if i!= df_log['ceil'].dropna().unique().sort_values().to_list()[0]: #avoiding dummy trap
x_train_im[f'ceil_{i}_log_ceil_measure15'] = x_train_im[f'ceil_{i}']*x_train_im['log_ceil_measure']
x_test_im[f'ceil_{i}_log_ceil_measure15'] = x_test_im[f'ceil_{i}']*x_test_im['log_ceil_measure']
# Creating interactions between each ceil and basement categories. Using a for loop since there are so many to make
for i in df['ceil'].dropna().unique().to_list():
if i!= df_log['ceil'].dropna().unique().sort_values().to_list()[0]:
for j in df_log['basement_category'].dropna().unique().to_list():
if j != df_log['basement_category'].dropna().unique().sort_values().to_list()[0]: #avoiding dummy trap
x_train_im[f'ceil_{i}_{j}'] = x_train_im[f'ceil_{i}']*x_train_im[f'basement_category_{j}']
x_test_im[f'ceil_{i}_{j}'] = x_test_im[f'ceil_{i}']*x_test_im[f'basement_category_{j}']
x_train_im.head(5)
| room_bed | room_bath | living_measure | lot_measure | sight | condition | quality | ceil_measure | basement | yr_built | yr_renovated | lat | long | living_measure15 | lot_measure15 | log_living_measure | log_lot_measure | log_ceil_measure | log_basement | log_living_measure15 | log_lot_measure15 | ceil_1.5 | ceil_2.0 | ceil_2.5 | ceil_3.0 | ceil_3.5 | coast_1.0 | furnished_1.0 | year_sold_2015 | warm_month_sold_1.0 | zip_price_cat_medium_price | zip_price_cat_high_price | basement_category_Small Basement | basement_category_Large Basement | furnished_log_ceil_measure | furnished_log_living_measure | furnished_log_living_measure15 | ceil_2.0_log_ceil_measure15 | ceil_3.0_log_ceil_measure15 | ceil_1.5_log_ceil_measure15 | ceil_2.5_log_ceil_measure15 | ceil_3.5_log_ceil_measure15 | ceil_2.0_Large Basement | ceil_2.0_Small Basement | ceil_3.0_Large Basement | ceil_3.0_Small Basement | ceil_1.5_Large Basement | ceil_1.5_Small Basement | ceil_2.5_Large Basement | ceil_2.5_Small Basement | ceil_3.5_Large Basement | ceil_3.5_Small Basement | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2060 | 2.000 | 2.000 | 1830.000 | 2856.000 | 0.000 | 3.000 | 7.000 | 1830.000 | 0.000 | 2005.000 | 0.000 | 48.000 | -122.000 | 1850.000 | 2667.000 | 7.512 | 7.957 | 7.512 | 0.000 | 7.523 | 7.889 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 7.512 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 11759 | 2.000 | 1.000 | 1380.000 | 5820.000 | 0.000 | 3.000 | 7.000 | 1380.000 | 0.000 | 1918.000 | 1976.000 | 48.000 | -122.000 | 1540.000 | 4076.000 | 7.230 | 8.669 | 7.230 | 0.000 | 7.340 | 8.313 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 8274 | 2.000 | 1.000 | 1130.000 | 6908.000 | 0.000 | 3.000 | 6.000 | 1130.000 | 0.000 | 1945.000 | 0.000 | 48.000 | -122.000 | 1150.000 | 6908.000 | 7.030 | 8.840 | 7.030 | 0.000 | 7.048 | 8.840 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 7.030 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 21272 | 3.000 | 2.000 | 1820.000 | 13362.000 | 0.000 | 3.000 | 8.000 | 1220.000 | 600.000 | 1977.000 | 0.000 | 48.000 | -122.000 | 2050.000 | 15000.000 | 7.507 | 9.500 | 7.107 | 6.399 | 7.626 | 9.616 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 14328 | 3.000 | 2.000 | 1660.000 | 7221.000 | 0.000 | 3.000 | 7.000 | 980.000 | 680.000 | 1962.000 | 0.000 | 47.000 | -122.000 | 1770.000 | 8083.000 | 7.415 | 8.885 | 6.888 | 6.524 | 7.479 | 8.998 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
countplot(df_log['basement_category'], order = True, rotation = 60, rotation_perc = 60)
The rest of the variables were created after splitting splitting and imputing the variable. So for exploratory purposes, I rejoined the train and test data together in another df and then added back in log_price so that we could get an overall picture.
x_EDA = pd.concat([x_train_im,x_test_im]).sort_index()
x_EDA = pd.concat([x_EDA,Y[['log_price']]], axis=1)
quantplot(x_EDA['log_price'])
The mean and median for log_price are 13.047886092909524 and 13.017002861746503, respectively. There are 52 below the lower whisker and 282 above the upper whisker.
quantplot(x_EDA.log_living_measure)
The mean and median for log_living_measure are 7.550319698479891 and 7.554858521040676, respectively. There are 54 below the lower whisker and 61 above the upper whisker.
quantplot(x_EDA.log_living_measure15)
The mean and median for log_living_measure15 are 7.539449042519315 and 7.517520850603031, respectively. There are 22 below the lower whisker and 55 above the upper whisker.
quantplot(x_EDA.log_lot_measure)
The mean and median for log_lot_measure are 8.989972620173079 and 8.938531648680692, respectively. There are 971 below the lower whisker and 1574 above the upper whisker.
quantplot(x_EDA.log_lot_measure15)
The mean and median for log_lot_measure15 are 8.960894813760625 and 8.938531648680692, respectively. There are 978 below the lower whisker and 1500 above the upper whisker.
quantplot(x_EDA.log_ceil_measure)
The mean and median for log_ceil_measure are 7.394864524821295 and 7.352441100243582, respectively. There are 11 below the lower whisker and 39 above the upper whisker.
quantplot(x_EDA.log_basement)
The mean and median for log_basement are 2.529417164246639 and 0.0, respectively. There are 0 below the lower whisker and 0 above the upper whisker.
Except for log basement, all continuous variables are now approximately normally distributed after log transformation. Basement has been binned into 3 categories in case basement's skewed distribution is the results of a mix of Gaussians. Interaction terms have been created for combinations of furnished with log_ceil_measure, and both log_living_measures and combinations of ceil and the different basement categories. I have kept the original variables that have been transformed to be used for the algorithms that use regression trees and do not require the target variable and predictors to be normally distributed.
The following block of code creates a function 'checking_vif' to measure the VIF score, the next block of code creates the function 'get_model_score' that reports multiple metrics (RMSE, MAPE, MAE, Economic Cost) to help us evaluate the different models.
Along with the usual metrics to evaluate the models, an additional metric Economic Cost is added to give an additional perspective of the model. If we assume that a house does not sell if it is overpriced, we can consider that as an opportunity cost of not selling the home. Additionally, an underpriced home will sell at a discount, and we can consider the lost potential revenue between what the house could have sold for and what the house sold for is as an opportunity cost as well. Economic Cost is defined as mean of the sum of the sell price for a house our model overpriced and the residual of a house that we underpriced. This metric will be biased against the houses we overpriced since we give greater weight to the house that we overpriced than the one we underpriced.
from statsmodels.stats.outliers_influence import variance_inflation_factor
def checking_vif(train):
vif = pd.DataFrame()
vif["feature"] = train.columns
# calculating VIF for each feature
vif["VIF"] = [
variance_inflation_factor(train.values, i) for i in range(len(train.columns))
]
return vif
# print(checking_vif(x_train))
# Function needed later to undo the normalization
def inv_transformation(feature_transformed, log=False, normalized=False, source=None):
"""
input the standard normal scaled target feature as an array
output the same array without the standard normal scale
"""
if normalized==True:
mu = source.mean().values
sd = source.std().values
feature_transformed = sd*feature_transformed + mu
if log==True:
feature_transformed = np.exp(feature_transformed)
return feature_transformed
## Function to calculate r2_score and RMSE on train and test data
def get_model_score(Y_train,Y_test,p_train=None,p_test=None,flag=True,mean=False,df_mean=None,mean_target=None,mean_X=None):
'''
model : classifier to predict values of X
mean: flag that tells the function that we want to compute a conditional mean model.
If used, only df_mean,mean_target, and mean_x and Y vectors are needed.
df_mean: df to use for conditional mean model
mean_target: target for var for conditional mean model
mean_X: conditional var for conditional mean model
log_n_scale= flag for whether to use inv_log_normalize_price function that reverses
scaling and log transformations, in that order.
'''
if mean==True:
if (mean_target==None) or (mean_X==None):
raise ValueError('df_mean, mean_target, and mean_target are required if mean argument is True')
else:
pred_train = df_mean.loc[(Y_train.index), [mean_target,mean_X]] # Using df_log bc zipcodes are not dummy vars in df
pred_test = df_mean.loc[(Y_test.index), [mean_target,mean_X]]
for i in pred_train.zipcode2.unique():
pred_train.loc[pred_train['zipcode2']==i, 'pred'] = pred_train.loc[pred_train[mean_X]==i, mean_target].mean()
pred_train['resid'] = pred_train[mean_target] - pred_train['pred']
for i in pred_train.zipcode2.unique():
pred_test.loc[pred_test[mean_X]==i, 'pred'] = pred_train.loc[pred_train[mean_X]==i, mean_target].mean()
pred_test['resid'] = pred_test[mean_target] - pred_test['pred']
Y_train = pred_train[mean_target].copy()
p_train = pred_train['resid']
Y_test = pred_test[mean_target].copy()
p_test = pred_test['resid'].copy()
# Calcutating economic cost
for i in range(2):
if i ==0:
values = pd.concat([Y_train, pd.DataFrame(p_train, index=y_train.index)],axis=1)
values['residual'] = values.iloc[:,0] - values.iloc[:,1]
values.loc[values['residual']>0, 'Economic Cost'] = values.loc[values['residual']>0, 'residual']
values.loc[values['residual']<0, 'Economic Cost'] = values.iloc[:,0]
economic_train = values['Economic Cost'].mean()
else:
values = pd.concat([Y_test, pd.DataFrame(p_test, index=y_test.index)],axis=1)
values['residual'] = values.iloc[:,0] - values.iloc[:,1]
values.loc[values['residual']>0, 'Economic Cost'] = values.loc[values['residual']>0, 'residual']
values.loc[values['residual']<0, 'Economic Cost'] = values.iloc[:,0]
economic_test = values['Economic Cost'].mean()
train_r2=metrics.r2_score(Y_train,p_train)
test_r2=metrics.r2_score(Y_test,p_test)
train_rmse=np.sqrt(metrics.mean_squared_error(Y_train,p_train))
test_rmse=np.sqrt(metrics.mean_squared_error(Y_test,p_test))
train_mae=np.sqrt(metrics.mean_absolute_error(Y_train,p_train))
test_mae=np.sqrt(metrics.mean_absolute_error(Y_test,p_test))
train_mape=np.sqrt(metrics.mean_absolute_percentage_error(Y_train,p_train))
test_mape=np.sqrt(metrics.mean_absolute_percentage_error(Y_test,p_test))
# defining an empty list to store train and test results
score_list=[]
#Adding all scores in the list
score_list.extend((train_r2,test_r2,train_rmse,test_rmse,train_mae,test_mae,train_mape,test_mape,economic_train,economic_test))
# If the flag is set to True then only the following print statements will be dispayed, the default value is True
if flag==True:
print("R-sqaure on training set : ",train_r2)
print("R-square on test set : ",test_r2)
print("RMSE on training set : ",train_rmse)
print("RMSE on test set : ",test_rmse)
print("MAE on training set : ",train_mae)
print("MAE on test set : ",test_mae)
print("MAPE on training set : ",train_mape)
print("MAPE on test set : ",test_mape)
print("Economic Cost on training set : ",economic_train)
print("Economic Cost on test set : ",economic_test)
# returning the list with train and test scores
return score_list
## Function to store list of metrics to compare models
# defining empty lists to add train and test results
r2_train = []
r2_test = []
rmse_train= []
rmse_test= []
mae_train= []
mae_test= []
mape_train= []
mape_test= []
econ_train= []
econ_test= []
def store_metrics(metrics_output):
# Adding the error metrics and r2 scores to their respetive lists
# accuracy score
r2_train.append(metrics_output[0])
r2_test.append(metrics_output[1])
rmse_train.append(metrics_output[2])
rmse_test.append(metrics_output[3])
mae_train.append(metrics_output[4])
mae_test.append(metrics_output[5])
mape_train.append(metrics_output[6])
mape_test.append(metrics_output[7])
econ_train.append(metrics_output[8])
econ_test.append(metrics_output[9])
The code below creates a separate dataset for the different set of algorithms. For the linear regression, we are including the log transformed versions of the vars and the interaction terms. Ridge and Lasso get a normalized version of this dataset. XGBoost already accounts for interactions and does not depend on the normality assumption, so it gets the dataset void of log transformations and interaction terms.
x_train_reg = x_train_im.copy()
x_test_reg = x_test_im.copy()
y_train_log = y_train[['log_price']]
y_train.drop(columns='log_price', inplace=True)
y_test_log = y_test[['log_price']]
y_test.drop(columns='log_price',inplace=True)
# Dropping vars we have created alternative vars for
x_train_reg.drop(columns=['living_measure','lot_measure','ceil_measure','living_measure15',
'lot_measure15','basement','log_basement'], inplace=True)
x_test_reg.drop(columns=['living_measure','lot_measure','ceil_measure','living_measure15',
'lot_measure15','basement','log_basement'], inplace=True)
# List of continuous vars for scaling
cont_vars = ['log_living_measure', 'log_lot_measure','log_ceil_measure','log_living_measure15','log_lot_measure15']
# Creating scaled dataset for OLS, Ridge and Lasso regressions
train = pd.concat([y_train_log, x_train_reg[cont_vars]], axis=1)
test = pd.concat([y_test_log, x_test_reg[cont_vars]], axis=1)
scaler = StandardScaler()
x_train_scaled = pd.DataFrame(scaler.fit_transform(train), index=train.index, columns=train.columns)
x_test_scaled = pd.DataFrame(scaler.fit_transform(test), index=test.index, columns=test.columns)
y_train_scaled = x_train_scaled[['log_price']]
x_train_scaled.drop(columns=['log_price'], inplace=True)
x_train_reg[cont_vars] = x_train_scaled
y_test_scaled = x_test_scaled[['log_price']]
x_test_scaled.drop(columns=['log_price'], inplace=True)
x_test_reg[cont_vars] = x_test_scaled
x_train_RnL = x_train_reg.copy()
x_test_RnL = x_test_reg.copy()
# Dataset for XGBoost that includes
x_train_xg = x_train_im.loc[:,:'basement_category_Large Basement']
x_train_xg = pd.DataFrame(scaler.fit_transform(x_train_xg), index=x_train_xg.index, columns=x_train_xg.columns)
x_train_xg = x_train_xg.loc[:,~x_train_xg.columns.str.startswith('log_')]
x_test_xg = x_test_im.loc[:,:'basement_category_Large Basement']
x_test_xg = pd.DataFrame(scaler.fit_transform(x_test_xg), index=x_test_xg.index, columns=x_test_xg.columns)
x_test_xg = x_test_xg.loc[:,~x_test_xg.columns.str.startswith('log_')]
y_train_xg = pd.DataFrame(scaler.fit_transform(y_train), index=y_train.index, columns=y_train.columns)
y_test_xg = pd.DataFrame(scaler.fit_transform(y_test), index=y_test.index, columns=y_test.columns)
The first model to run as a benchmark is the mean of the price conditioned on the var zipcode2. I chose this var given how location is known to be important to real estate prices. This model serves as a naive estimate and its output will serve as a initial benchmark to compare the rest of the models.
metrics_list=get_model_score(Y_train=y_train,Y_test=y_test,mean=True,df_mean=df_log,mean_target='price',mean_X='zipcode2')
store_metrics(metrics_list)
R-sqaure on training set : -1.2251280655033963 R-square on test set : -1.2327705075733544 RMSE on training set : 548140.1410798768 RMSE on test set : 548531.0749144884 MAE on training set : 735.2726609772508 MAE on test set : 735.5435992939621 MAPE on training set : 1.139215409370713 MAPE on test set : 1.1395550175838363 Economic Cost on training set : 540625.8859805949 Economic Cost on test set : 541024.3864623354
The linear regression model attempts to find a linear relationship (line, plane, etc) that best represents the relationship between the group of explanatory variables and the target variable (housing price in our case). To do so, we will estimate OLS model on the full set of explantory variables. Then we will weed out those variables who have high VIF scores and/or statistically insignificant estimated coefficients.
# Statsmodel api does not add a constant by default. We need to add it explicitly.
x_train_reg = sm.add_constant(x_train_reg)
# Add constant to test data
x_test_reg = sm.add_constant(x_test_reg)
# This function helps streamline the code to build the OLS model
def build_ols_model(train):
# Create the model
olsmodel = sm.OLS(y_train_scaled, train)
return olsmodel.fit()
olsmodel1 = build_ols_model(x_train_reg)
print(olsmodel1.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.803
Model: OLS Adj. R-squared: 0.802
Method: Least Squares F-statistic: 1427.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:38:58 Log-Likelihood: -9189.0
No. Observations: 15129 AIC: 1.847e+04
Df Residuals: 15085 BIC: 1.880e+04
Df Model: 43
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -100.1996 10.840 -9.243 0.000 -121.448 -78.951
room_bed -0.0471 0.006 -8.503 0.000 -0.058 -0.036
room_bath 0.0797 0.007 10.805 0.000 0.065 0.094
sight 0.0929 0.006 16.586 0.000 0.082 0.104
condition 0.1195 0.006 18.993 0.000 0.107 0.132
quality 0.2346 0.007 33.662 0.000 0.221 0.248
yr_built -0.0045 0.000 -21.880 0.000 -0.005 -0.004
yr_renovated 7.895e-05 9.67e-06 8.168 0.000 6e-05 9.79e-05
lat 0.8411 0.009 91.124 0.000 0.823 0.859
long -0.5446 0.088 -6.162 0.000 -0.718 -0.371
log_living_measure 0.1746 0.025 6.975 0.000 0.126 0.224
log_lot_measure 0.0453 0.010 4.749 0.000 0.027 0.064
log_ceil_measure 0.1131 0.024 4.693 0.000 0.066 0.160
log_living_measure15 0.1826 0.007 26.235 0.000 0.169 0.196
log_lot_measure15 -0.0362 0.009 -3.868 0.000 -0.055 -0.018
ceil_1.5 -0.1822 0.327 -0.558 0.577 -0.823 0.458
ceil_2.0 -0.3194 0.226 -1.411 0.158 -0.763 0.124
ceil_2.5 -0.9554 1.071 -0.892 0.373 -3.056 1.145
ceil_3.0 -0.6030 0.505 -1.194 0.233 -1.593 0.387
ceil_3.5 -5.7190 5.774 -0.991 0.322 -17.036 5.598
coast_1.0 0.8097 0.047 17.215 0.000 0.717 0.902
furnished_1.0 1.5384 0.324 4.747 0.000 0.903 2.174
year_sold_2015 0.1030 0.008 13.153 0.000 0.088 0.118
warm_month_sold_1.0 0.0482 0.009 5.602 0.000 0.031 0.065
zip_price_cat_medium_price -0.0035 0.018 -0.193 0.847 -0.039 0.032
zip_price_cat_high_price 0.1676 0.019 8.971 0.000 0.131 0.204
basement_category_Small Basement 0.0761 0.023 3.376 0.001 0.032 0.120
basement_category_Large Basement 0.0663 0.035 1.885 0.060 -0.003 0.135
furnished_log_ceil_measure -0.1202 0.055 -2.174 0.030 -0.229 -0.012
furnished_log_living_measure 0.5702 0.061 9.416 0.000 0.451 0.689
furnished_log_living_measure15 -0.6479 0.042 -15.526 0.000 -0.730 -0.566
ceil_2.0_log_ceil_measure15 0.0480 0.030 1.597 0.110 -0.011 0.107
ceil_3.0_log_ceil_measure15 0.1076 0.068 1.572 0.116 -0.027 0.242
ceil_1.5_log_ceil_measure15 0.0319 0.045 0.717 0.474 -0.055 0.119
ceil_2.5_log_ceil_measure15 0.1373 0.137 1.004 0.316 -0.131 0.405
ceil_3.5_log_ceil_measure15 0.8187 0.765 1.071 0.284 -0.680 2.317
ceil_2.0_Large Basement 0.0213 0.028 0.766 0.443 -0.033 0.076
ceil_2.0_Small Basement 0.0591 0.025 2.398 0.016 0.011 0.107
ceil_3.0_Large Basement -0.3712 0.134 -2.779 0.005 -0.633 -0.109
ceil_3.0_Small Basement 0.0010 0.069 0.014 0.989 -0.135 0.137
ceil_1.5_Large Basement 0.0178 0.034 0.527 0.598 -0.048 0.084
ceil_1.5_Small Basement 0.0382 0.035 1.092 0.275 -0.030 0.107
ceil_2.5_Large Basement -0.0713 0.118 -0.605 0.545 -0.302 0.160
ceil_2.5_Small Basement 0.0695 0.106 0.653 0.514 -0.139 0.278
ceil_3.5_Large Basement 0 0 nan nan 0 0
ceil_3.5_Small Basement 0 0 nan nan 0 0
==============================================================================
Omnibus: 417.829 Durbin-Watson: 2.021
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1037.072
Skew: -0.062 Prob(JB): 6.35e-226
Kurtosis: 4.277 Cond. No. 1.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.91e-22. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
# Check VIF
print(checking_vif(x_train_reg).sort_values(by='VIF', ascending=False).head(15))
feature VIF 0 const 8985798.515 29 furnished_log_living_measure 2888.110 28 furnished_log_ceil_measure 2333.389 30 furnished_log_living_measure15 1328.049 21 furnished_1.0 1275.748 31 ceil_2.0_log_ceil_measure15 977.453 16 ceil_2.0 923.311 35 ceil_3.5_log_ceil_measure15 673.764 19 ceil_3.5 673.749 33 ceil_1.5_log_ceil_measure15 669.635 15 ceil_1.5 664.598 17 ceil_2.5 656.447 34 ceil_2.5_log_ceil_measure15 656.038 32 ceil_3.0_log_ceil_measure15 544.874 18 ceil_3.0 541.028
x_train_reg.drop(columns='furnished_log_living_measure', inplace=True)
olsmodel2 = build_ols_model(x_train_reg)
print(olsmodel2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.802
Model: OLS Adj. R-squared: 0.801
Method: Least Squares F-statistic: 1451.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:03 Log-Likelihood: -9233.3
No. Observations: 15129 AIC: 1.855e+04
Df Residuals: 15086 BIC: 1.888e+04
Df Model: 42
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -100.1874 10.872 -9.215 0.000 -121.498 -78.877
room_bed -0.0484 0.006 -8.720 0.000 -0.059 -0.038
room_bath 0.0869 0.007 11.810 0.000 0.073 0.101
sight 0.1006 0.006 18.106 0.000 0.090 0.111
condition 0.1183 0.006 18.763 0.000 0.106 0.131
quality 0.2371 0.007 33.945 0.000 0.223 0.251
yr_built -0.0044 0.000 -21.648 0.000 -0.005 -0.004
yr_renovated 8.01e-05 9.69e-06 8.264 0.000 6.11e-05 9.91e-05
lat 0.8400 0.009 90.741 0.000 0.822 0.858
long -0.5442 0.089 -6.140 0.000 -0.718 -0.370
log_living_measure 0.1847 0.025 7.369 0.000 0.136 0.234
log_lot_measure 0.0482 0.010 5.038 0.000 0.029 0.067
log_ceil_measure 0.1110 0.024 4.593 0.000 0.064 0.158
log_living_measure15 0.1739 0.007 25.127 0.000 0.160 0.187
log_lot_measure15 -0.0381 0.009 -4.050 0.000 -0.056 -0.020
ceil_1.5 -0.0975 0.328 -0.298 0.766 -0.739 0.544
ceil_2.0 -0.3067 0.227 -1.351 0.177 -0.751 0.138
ceil_2.5 -1.3275 1.074 -1.236 0.216 -3.432 0.777
ceil_3.0 -0.5853 0.507 -1.156 0.248 -1.578 0.408
ceil_3.5 -5.6205 5.790 -0.971 0.332 -16.970 5.729
coast_1.0 0.8040 0.047 17.047 0.000 0.712 0.896
furnished_1.0 2.3124 0.314 7.355 0.000 1.696 2.929
year_sold_2015 0.1030 0.008 13.124 0.000 0.088 0.118
warm_month_sold_1.0 0.0489 0.009 5.665 0.000 0.032 0.066
zip_price_cat_medium_price -0.0043 0.018 -0.237 0.813 -0.040 0.032
zip_price_cat_high_price 0.1677 0.019 8.951 0.000 0.131 0.204
basement_category_Small Basement 0.0730 0.023 3.230 0.001 0.029 0.117
basement_category_Large Basement 0.0754 0.035 2.136 0.033 0.006 0.145
furnished_log_ceil_measure 0.2436 0.040 6.139 0.000 0.166 0.321
furnished_log_living_measure15 -0.5320 0.040 -13.304 0.000 -0.610 -0.454
ceil_2.0_log_ceil_measure15 0.0435 0.030 1.442 0.149 -0.016 0.103
ceil_3.0_log_ceil_measure15 0.1024 0.069 1.491 0.136 -0.032 0.237
ceil_1.5_log_ceil_measure15 0.0200 0.045 0.448 0.654 -0.067 0.107
ceil_2.5_log_ceil_measure15 0.1821 0.137 1.329 0.184 -0.087 0.451
ceil_3.5_log_ceil_measure15 0.8035 0.767 1.048 0.295 -0.699 2.307
ceil_2.0_Large Basement 0.0982 0.027 3.694 0.000 0.046 0.150
ceil_2.0_Small Basement 0.0751 0.025 3.049 0.002 0.027 0.123
ceil_3.0_Large Basement -0.2517 0.133 -1.888 0.059 -0.513 0.010
ceil_3.0_Small Basement 0.0215 0.069 0.309 0.757 -0.115 0.158
ceil_1.5_Large Basement 0.0150 0.034 0.443 0.658 -0.051 0.081
ceil_1.5_Small Basement 0.0389 0.035 1.109 0.268 -0.030 0.108
ceil_2.5_Large Basement 0.0222 0.118 0.188 0.851 -0.208 0.253
ceil_2.5_Small Basement 0.0958 0.107 0.898 0.369 -0.113 0.305
ceil_3.5_Large Basement 0 0 nan nan 0 0
ceil_3.5_Small Basement 0 0 nan nan 0 0
==============================================================================
Omnibus: 404.123 Durbin-Watson: 2.020
Prob(Omnibus): 0.000 Jarque-Bera (JB): 996.348
Skew: -0.047 Prob(JB): 4.42e-217
Kurtosis: 4.254 Cond. No. 1.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.91e-22. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
# Check VIF
print(checking_vif(x_train_reg).sort_values(by='VIF', ascending=False).head(15))
feature VIF 0 const 8985798.390 29 furnished_log_living_measure15 1212.587 28 furnished_log_ceil_measure 1194.485 21 furnished_1.0 1193.667 30 ceil_2.0_log_ceil_measure15 977.204 16 ceil_2.0 923.278 34 ceil_3.5_log_ceil_measure15 673.761 19 ceil_3.5 673.747 32 ceil_1.5_log_ceil_measure15 669.095 15 ceil_1.5 664.095 17 ceil_2.5 655.554 33 ceil_2.5_log_ceil_measure15 655.243 31 ceil_3.0_log_ceil_measure15 544.838 18 ceil_3.0 541.021 10 log_living_measure 47.791
x_train_reg.drop(columns='furnished_log_living_measure15', inplace=True)
olsmodel3 = build_ols_model(x_train_reg)
print(olsmodel3.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.799
Model: OLS Adj. R-squared: 0.799
Method: Least Squares F-statistic: 1465.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:06 Log-Likelihood: -9321.6
No. Observations: 15129 AIC: 1.873e+04
Df Residuals: 15087 BIC: 1.905e+04
Df Model: 41
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -102.0716 10.934 -9.335 0.000 -123.504 -80.639
room_bed -0.0503 0.006 -9.018 0.000 -0.061 -0.039
room_bath 0.0868 0.007 11.722 0.000 0.072 0.101
sight 0.0968 0.006 17.353 0.000 0.086 0.108
condition 0.1174 0.006 18.510 0.000 0.105 0.130
quality 0.2374 0.007 33.793 0.000 0.224 0.251
yr_built -0.0043 0.000 -20.857 0.000 -0.005 -0.004
yr_renovated 8.152e-05 9.75e-06 8.362 0.000 6.24e-05 0.000
lat 0.8421 0.009 90.459 0.000 0.824 0.860
long -0.5565 0.089 -6.242 0.000 -0.731 -0.382
log_living_measure 0.1916 0.025 7.598 0.000 0.142 0.241
log_lot_measure 0.0544 0.010 5.664 0.000 0.036 0.073
log_ceil_measure 0.1219 0.024 5.018 0.000 0.074 0.169
log_living_measure15 0.1323 0.006 21.305 0.000 0.120 0.144
log_lot_measure15 -0.0417 0.009 -4.415 0.000 -0.060 -0.023
ceil_1.5 -0.2138 0.329 -0.649 0.516 -0.859 0.432
ceil_2.0 -0.4957 0.228 -2.176 0.030 -0.942 -0.049
ceil_2.5 -1.9563 1.079 -1.813 0.070 -4.071 0.159
ceil_3.0 -1.2391 0.507 -2.444 0.015 -2.233 -0.245
ceil_3.5 -6.1723 5.824 -1.060 0.289 -17.588 5.243
coast_1.0 0.7973 0.047 16.808 0.000 0.704 0.890
furnished_1.0 0.1376 0.270 0.509 0.610 -0.392 0.667
year_sold_2015 0.1055 0.008 13.363 0.000 0.090 0.121
warm_month_sold_1.0 0.0491 0.009 5.665 0.000 0.032 0.066
zip_price_cat_medium_price -0.0075 0.018 -0.405 0.686 -0.044 0.029
zip_price_cat_high_price 0.1646 0.019 8.733 0.000 0.128 0.201
basement_category_Small Basement 0.0843 0.023 3.713 0.000 0.040 0.129
basement_category_Large Basement 0.0847 0.035 2.388 0.017 0.015 0.154
furnished_log_ceil_measure -0.0117 0.035 -0.336 0.737 -0.080 0.057
ceil_2.0_log_ceil_measure15 0.0702 0.030 2.317 0.021 0.011 0.130
ceil_3.0_log_ceil_measure15 0.1927 0.069 2.805 0.005 0.058 0.327
ceil_1.5_log_ceil_measure15 0.0363 0.045 0.808 0.419 -0.052 0.124
ceil_2.5_log_ceil_measure15 0.2676 0.138 1.943 0.052 -0.002 0.538
ceil_3.5_log_ceil_measure15 0.8743 0.771 1.134 0.257 -0.637 2.386
ceil_2.0_Large Basement 0.0851 0.027 3.182 0.001 0.033 0.137
ceil_2.0_Small Basement 0.0665 0.025 2.682 0.007 0.018 0.115
ceil_3.0_Large Basement -0.2398 0.134 -1.788 0.074 -0.503 0.023
ceil_3.0_Small Basement 0.0091 0.070 0.131 0.896 -0.128 0.146
ceil_1.5_Large Basement 0.0098 0.034 0.289 0.773 -0.057 0.077
ceil_1.5_Small Basement 0.0347 0.035 0.983 0.326 -0.035 0.104
ceil_2.5_Large Basement -0.0080 0.118 -0.068 0.946 -0.240 0.224
ceil_2.5_Small Basement 0.1070 0.107 0.997 0.319 -0.103 0.317
ceil_3.5_Large Basement 0 0 nan nan 0 0
ceil_3.5_Small Basement 0 0 nan nan 0 0
==============================================================================
Omnibus: 382.308 Durbin-Watson: 2.021
Prob(Omnibus): 0.000 Jarque-Bera (JB): 926.456
Skew: -0.022 Prob(JB): 6.65e-202
Kurtosis: 4.211 Cond. No. 1.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.91e-22. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
# Check VIF
print(checking_vif(x_train_reg).sort_values(by='VIF', ascending=False).head(15))
feature VIF 0 const 8984273.554 29 ceil_2.0_log_ceil_measure15 972.899 16 ceil_2.0 919.659 28 furnished_log_ceil_measure 915.109 21 furnished_1.0 870.963 33 ceil_3.5_log_ceil_measure15 673.728 19 ceil_3.5 673.713 31 ceil_1.5_log_ceil_measure15 668.592 15 ceil_1.5 663.622 17 ceil_2.5 654.284 32 ceil_2.5_log_ceil_measure15 653.803 30 ceil_3.0_log_ceil_measure15 539.505 18 ceil_3.0 535.928 10 log_living_measure 47.771 12 log_ceil_measure 44.315
x_train_reg.drop(columns=['ceil_1.5_log_ceil_measure15','ceil_2.0_log_ceil_measure15','ceil_2.5_log_ceil_measure15',
'ceil_3.0_log_ceil_measure15','ceil_3.5_log_ceil_measure15'], inplace=True)
olsmodel3 = build_ols_model(x_train_reg)
print(olsmodel3.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.799
Model: OLS Adj. R-squared: 0.799
Method: Least Squares F-statistic: 1667.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:09 Log-Likelihood: -9328.6
No. Observations: 15129 AIC: 1.873e+04
Df Residuals: 15092 BIC: 1.901e+04
Df Model: 36
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -101.4735 10.918 -9.294 0.000 -122.874 -80.074
room_bed -0.0495 0.006 -8.895 0.000 -0.060 -0.039
room_bath 0.0867 0.007 11.723 0.000 0.072 0.101
sight 0.0969 0.006 17.381 0.000 0.086 0.108
condition 0.1164 0.006 18.386 0.000 0.104 0.129
quality 0.2343 0.007 33.705 0.000 0.221 0.248
yr_built -0.0044 0.000 -21.778 0.000 -0.005 -0.004
yr_renovated 8.019e-05 9.73e-06 8.238 0.000 6.11e-05 9.93e-05
lat 0.8413 0.009 90.395 0.000 0.823 0.860
long -0.5541 0.089 -6.224 0.000 -0.729 -0.380
log_living_measure 0.2009 0.025 8.020 0.000 0.152 0.250
log_lot_measure 0.0552 0.010 5.751 0.000 0.036 0.074
log_ceil_measure 0.1260 0.024 5.217 0.000 0.079 0.173
log_living_measure15 0.1332 0.006 21.481 0.000 0.121 0.145
log_lot_measure15 -0.0415 0.009 -4.398 0.000 -0.060 -0.023
ceil_1.5 0.0433 0.018 2.347 0.019 0.007 0.079
ceil_2.0 0.0280 0.014 1.996 0.046 0.001 0.056
ceil_2.5 0.1145 0.059 1.931 0.054 -0.002 0.231
ceil_3.0 0.1819 0.028 6.422 0.000 0.126 0.237
ceil_3.5 0.4224 0.225 1.876 0.061 -0.019 0.864
coast_1.0 0.8026 0.047 16.930 0.000 0.710 0.896
furnished_1.0 -0.2010 0.238 -0.845 0.398 -0.667 0.265
year_sold_2015 0.1054 0.008 13.347 0.000 0.090 0.121
warm_month_sold_1.0 0.0494 0.009 5.697 0.000 0.032 0.066
zip_price_cat_medium_price -0.0073 0.018 -0.393 0.694 -0.043 0.029
zip_price_cat_high_price 0.1644 0.019 8.725 0.000 0.127 0.201
basement_category_Small Basement 0.0807 0.023 3.556 0.000 0.036 0.125
basement_category_Large Basement 0.0714 0.035 2.027 0.043 0.002 0.140
furnished_log_ceil_measure 0.0333 0.030 1.093 0.274 -0.026 0.093
ceil_2.0_Large Basement 0.0891 0.027 3.341 0.001 0.037 0.141
ceil_2.0_Small Basement 0.0577 0.024 2.392 0.017 0.010 0.105
ceil_3.0_Large Basement -0.1331 0.125 -1.067 0.286 -0.378 0.111
ceil_3.0_Small Basement 0.0231 0.070 0.331 0.741 -0.113 0.160
ceil_1.5_Large Basement 0.0141 0.034 0.413 0.680 -0.053 0.081
ceil_1.5_Small Basement 0.0344 0.035 0.975 0.330 -0.035 0.104
ceil_2.5_Large Basement 0.0515 0.112 0.459 0.646 -0.168 0.272
ceil_2.5_Small Basement 0.0498 0.101 0.494 0.622 -0.148 0.248
ceil_3.5_Large Basement 0 0 nan nan 0 0
ceil_3.5_Small Basement 0 0 nan nan 0 0
==============================================================================
Omnibus: 378.123 Durbin-Watson: 2.020
Prob(Omnibus): 0.000 Jarque-Bera (JB): 911.300
Skew: -0.022 Prob(JB): 1.30e-198
Kurtosis: 4.202 Cond. No. 1.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.91e-22. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
# Check VIF
print(checking_vif(x_train_reg).sort_values(by='VIF', ascending=False).head(15))
feature VIF 0 const 8951738.633 28 furnished_log_ceil_measure 696.731 21 furnished_1.0 674.607 10 log_living_measure 47.124 12 log_ceil_measure 43.795 27 basement_category_Large Basement 15.130 11 log_lot_measure 6.926 14 log_lot_measure15 6.703 25 zip_price_cat_high_price 6.575 24 zip_price_cat_medium_price 6.407 26 basement_category_Small Basement 5.917 5 quality 5.020 16 ceil_2.0 3.497 13 log_living_measure15 2.886 6 yr_built 2.659
x_train_reg.drop(columns=['furnished_log_ceil_measure'], inplace=True)
olsmodel4 = build_ols_model(x_train_reg)
print(olsmodel4.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.799
Model: OLS Adj. R-squared: 0.799
Method: Least Squares F-statistic: 1714.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:11 Log-Likelihood: -9329.2
No. Observations: 15129 AIC: 1.873e+04
Df Residuals: 15093 BIC: 1.900e+04
Df Model: 35
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -101.4323 10.918 -9.291 0.000 -122.832 -80.032
room_bed -0.0498 0.006 -8.953 0.000 -0.061 -0.039
room_bath 0.0879 0.007 12.026 0.000 0.074 0.102
sight 0.0966 0.006 17.346 0.000 0.086 0.107
condition 0.1161 0.006 18.354 0.000 0.104 0.128
quality 0.2348 0.007 33.875 0.000 0.221 0.248
yr_built -0.0044 0.000 -22.035 0.000 -0.005 -0.004
yr_renovated 7.938e-05 9.71e-06 8.179 0.000 6.04e-05 9.84e-05
lat 0.8418 0.009 90.606 0.000 0.824 0.860
long -0.5539 0.089 -6.222 0.000 -0.728 -0.379
log_living_measure 0.2009 0.025 8.021 0.000 0.152 0.250
log_lot_measure 0.0556 0.010 5.799 0.000 0.037 0.074
log_ceil_measure 0.1278 0.024 5.304 0.000 0.081 0.175
log_living_measure15 0.1331 0.006 21.474 0.000 0.121 0.145
log_lot_measure15 -0.0414 0.009 -4.382 0.000 -0.060 -0.023
ceil_1.5 0.0422 0.018 2.290 0.022 0.006 0.078
ceil_2.0 0.0277 0.014 1.969 0.049 0.000 0.055
ceil_2.5 0.1135 0.059 1.914 0.056 -0.003 0.230
ceil_3.0 0.1808 0.028 6.389 0.000 0.125 0.236
ceil_3.5 0.4212 0.225 1.870 0.061 -0.020 0.863
coast_1.0 0.8040 0.047 16.965 0.000 0.711 0.897
furnished_1.0 0.0584 0.015 3.803 0.000 0.028 0.088
year_sold_2015 0.1054 0.008 13.347 0.000 0.090 0.121
warm_month_sold_1.0 0.0494 0.009 5.694 0.000 0.032 0.066
zip_price_cat_medium_price -0.0077 0.018 -0.417 0.677 -0.044 0.029
zip_price_cat_high_price 0.1640 0.019 8.704 0.000 0.127 0.201
basement_category_Small Basement 0.0804 0.023 3.542 0.000 0.036 0.125
basement_category_Large Basement 0.0694 0.035 1.973 0.049 0.000 0.138
ceil_2.0_Large Basement 0.0901 0.027 3.379 0.001 0.038 0.142
ceil_2.0_Small Basement 0.0557 0.024 2.318 0.020 0.009 0.103
ceil_3.0_Large Basement -0.1245 0.124 -1.000 0.317 -0.369 0.120
ceil_3.0_Small Basement 0.0229 0.070 0.329 0.742 -0.114 0.159
ceil_1.5_Large Basement 0.0143 0.034 0.420 0.674 -0.052 0.081
ceil_1.5_Small Basement 0.0337 0.035 0.954 0.340 -0.036 0.103
ceil_2.5_Large Basement 0.0576 0.112 0.514 0.607 -0.162 0.277
ceil_2.5_Small Basement 0.0487 0.101 0.483 0.629 -0.149 0.247
ceil_3.5_Large Basement 0 0 nan nan 0 0
ceil_3.5_Small Basement 0 0 nan nan 0 0
==============================================================================
Omnibus: 379.051 Durbin-Watson: 2.020
Prob(Omnibus): 0.000 Jarque-Bera (JB): 915.301
Skew: -0.020 Prob(JB): 1.76e-199
Kurtosis: 4.204 Cond. No. 1.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.91e-22. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
# Check VIF
print(checking_vif(x_train_reg).sort_values(by='VIF', ascending=False).head(15))
feature VIF 0 const 8951631.714 10 log_living_measure 47.124 12 log_ceil_measure 43.591 27 basement_category_Large Basement 15.089 11 log_lot_measure 6.916 14 log_lot_measure15 6.701 25 zip_price_cat_high_price 6.572 24 zip_price_cat_medium_price 6.404 26 basement_category_Small Basement 5.916 5 quality 4.993 16 ceil_2.0 3.495 13 log_living_measure15 2.886 21 furnished_1.0 2.813 6 yr_built 2.626 2 room_bath 2.335
x_train_reg.drop(columns=['log_living_measure'], inplace=True)
olsmodel5 = build_ols_model(x_train_reg)
print(olsmodel5.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.798
Model: OLS Adj. R-squared: 0.798
Method: Least Squares F-statistic: 1756.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:13 Log-Likelihood: -9361.3
No. Observations: 15129 AIC: 1.879e+04
Df Residuals: 15094 BIC: 1.906e+04
Df Model: 34
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -101.7102 10.941 -9.297 0.000 -123.155 -80.265
room_bed -0.0450 0.006 -8.116 0.000 -0.056 -0.034
room_bath 0.0938 0.007 12.861 0.000 0.079 0.108
sight 0.0967 0.006 17.332 0.000 0.086 0.108
condition 0.1202 0.006 19.035 0.000 0.108 0.133
quality 0.2365 0.007 34.058 0.000 0.223 0.250
yr_built -0.0044 0.000 -21.928 0.000 -0.005 -0.004
yr_renovated 8.266e-05 9.72e-06 8.506 0.000 6.36e-05 0.000
lat 0.8421 0.009 90.442 0.000 0.824 0.860
long -0.5548 0.089 -6.218 0.000 -0.730 -0.380
log_lot_measure 0.0558 0.010 5.798 0.000 0.037 0.075
log_ceil_measure 0.3070 0.009 33.933 0.000 0.289 0.325
log_living_measure15 0.1363 0.006 21.992 0.000 0.124 0.148
log_lot_measure15 -0.0399 0.009 -4.217 0.000 -0.058 -0.021
ceil_1.5 0.0496 0.018 2.689 0.007 0.013 0.086
ceil_2.0 0.0469 0.014 3.384 0.001 0.020 0.074
ceil_2.5 0.1335 0.059 2.249 0.025 0.017 0.250
ceil_3.0 0.1892 0.028 6.674 0.000 0.134 0.245
ceil_3.5 0.4420 0.226 1.959 0.050 -0.000 0.884
coast_1.0 0.8088 0.047 17.031 0.000 0.716 0.902
furnished_1.0 0.0618 0.015 4.018 0.000 0.032 0.092
year_sold_2015 0.1051 0.008 13.286 0.000 0.090 0.121
warm_month_sold_1.0 0.0493 0.009 5.674 0.000 0.032 0.066
zip_price_cat_medium_price -0.0066 0.019 -0.356 0.722 -0.043 0.030
zip_price_cat_high_price 0.1659 0.019 8.789 0.000 0.129 0.203
basement_category_Small Basement 0.2238 0.014 15.985 0.000 0.196 0.251
basement_category_Large Basement 0.3250 0.015 21.717 0.000 0.296 0.354
ceil_2.0_Large Basement -0.0126 0.023 -0.536 0.592 -0.058 0.033
ceil_2.0_Small Basement -0.0049 0.023 -0.214 0.830 -0.050 0.040
ceil_3.0_Large Basement -0.2312 0.124 -1.864 0.062 -0.474 0.012
ceil_3.0_Small Basement -0.0479 0.069 -0.692 0.489 -0.184 0.088
ceil_1.5_Large Basement -0.0357 0.034 -1.065 0.287 -0.101 0.030
ceil_1.5_Small Basement -0.0067 0.035 -0.191 0.849 -0.075 0.062
ceil_2.5_Large Basement -0.0570 0.111 -0.512 0.609 -0.275 0.161
ceil_2.5_Small Basement -0.0355 0.101 -0.352 0.725 -0.233 0.162
ceil_3.5_Large Basement 0 0 nan nan 0 0
ceil_3.5_Small Basement 0 0 nan nan 0 0
==============================================================================
Omnibus: 379.353 Durbin-Watson: 2.018
Prob(Omnibus): 0.000 Jarque-Bera (JB): 915.348
Skew: -0.023 Prob(JB): 1.72e-199
Kurtosis: 4.204 Cond. No. 1.00e+16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.91e-22. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
# Check VIF
print(checking_vif(x_train_reg).sort_values(by='VIF', ascending=False))
feature VIF 0 const 8951541.568 10 log_lot_measure 6.916 13 log_lot_measure15 6.699 24 zip_price_cat_high_price 6.571 23 zip_price_cat_medium_price 6.404 11 log_ceil_measure 6.120 5 quality 4.988 15 ceil_2.0 3.393 12 log_living_measure15 2.874 20 furnished_1.0 2.811 26 basement_category_Large Basement 2.718 6 yr_built 2.626 2 room_bath 2.312 25 basement_category_Small Basement 2.243 14 ceil_1.5 2.072 16 ceil_2.5 1.972 28 ceil_2.0_Small Basement 1.891 1 room_bed 1.858 27 ceil_2.0_Large Basement 1.732 17 ceil_3.0 1.667 31 ceil_1.5_Large Basement 1.580 34 ceil_2.5_Small Basement 1.548 32 ceil_1.5_Small Basement 1.524 33 ceil_2.5_Large Basement 1.409 3 sight 1.387 8 lat 1.378 4 condition 1.259 19 coast_1.0 1.195 30 ceil_3.0_Small Basement 1.181 7 yr_renovated 1.164 29 ceil_3.0_Large Basement 1.064 21 year_sold_2015 1.027 9 long 1.023 22 warm_month_sold_1.0 1.022 18 ceil_3.5 1.007 35 ceil_3.5_Large Basement NaN 36 ceil_3.5_Small Basement NaN
x_train_reg['ceil_3.5_Large Basement'].unique()
array([0.])
x_train_reg['ceil_3.5_Small Basement'].unique()
array([0.])
x_train_im.loc[x_train_im['ceil_3.5']==1.0, ['basement','basement_category_Small Basement', 'basement_category_Large Basement']]
| basement | basement_category_Small Basement | basement_category_Large Basement | |
|---|---|---|---|
| 842 | 0.000 | 0.000 | 0.000 |
| 10493 | 0.000 | 0.000 | 0.000 |
| 20957 | 0.000 | 0.000 | 0.000 |
| 6361 | 0.000 | 0.000 | 0.000 |
To make the model more complete, I am combining ceil_3.0 and ceil3.5 vars and naming their replacement column as ceil>=3.0. I then replace the flawed interaction terms with the new interaction terms that combine the basement categories with the new ceil column. I then rerun the model.
x_test_reg = x_test_reg[x_train_reg.columns]
x_train_reg2 = x_train_reg.copy()
x_test_reg2 = x_test_reg.copy()
x_train_reg2.rename(columns={'ceil_3.0':'ceil_>=3.0'}, inplace=True)
x_test_reg2.rename(columns={'ceil_3.0':'ceil_>=3.0'}, inplace=True)
x_train_reg2.loc[(x_train_reg2['ceil_3.5']==1.0), 'ceil_>=3.0'] = 1.0
x_test_reg2.loc[(x_test_reg2['ceil_3.5']==1.0), 'ceil_>=3.0'] = 1.0
x_train_reg2.drop(columns=['ceil_3.5','ceil_3.0_Small Basement','ceil_3.0_Large Basement','ceil_3.5_Small Basement',
'ceil_3.5_Large Basement'], inplace=True)
x_test_reg2.drop(columns=['ceil_3.5','ceil_3.0_Small Basement','ceil_3.0_Large Basement','ceil_3.5_Small Basement',
'ceil_3.5_Large Basement'], inplace=True)
# Creating interactions between each ceil and basement categories. Using a for loop since there are so many to make
for i in ['Small Basement','Large Basement']:
x_train_reg2[f'ceil_>=3.0_{i}'] = x_train_reg2['ceil_>=3.0']*x_train_reg2[f'basement_category_{i}']
x_test_reg2[f'ceil_>=3.0_{i}'] = x_test_reg2['ceil_>=3.0']*x_test_reg2[f'basement_category_{i}']
olsmodel6 = build_ols_model(x_train_reg2)
print(olsmodel6.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.798
Model: OLS Adj. R-squared: 0.798
Method: Least Squares F-statistic: 1809.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:15 Log-Likelihood: -9362.0
No. Observations: 15129 AIC: 1.879e+04
Df Residuals: 15095 BIC: 1.905e+04
Df Model: 33
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -101.7115 10.941 -9.297 0.000 -123.156 -80.267
room_bed -0.0450 0.006 -8.129 0.000 -0.056 -0.034
room_bath 0.0937 0.007 12.856 0.000 0.079 0.108
sight 0.0967 0.006 17.342 0.000 0.086 0.108
condition 0.1202 0.006 19.034 0.000 0.108 0.133
quality 0.2364 0.007 34.048 0.000 0.223 0.250
yr_built -0.0044 0.000 -21.926 0.000 -0.005 -0.004
yr_renovated 8.265e-05 9.72e-06 8.505 0.000 6.36e-05 0.000
lat 0.8421 0.009 90.443 0.000 0.824 0.860
long -0.5548 0.089 -6.219 0.000 -0.730 -0.380
log_lot_measure 0.0558 0.010 5.805 0.000 0.037 0.075
log_ceil_measure 0.3072 0.009 33.967 0.000 0.289 0.325
log_living_measure15 0.1363 0.006 21.990 0.000 0.124 0.148
log_lot_measure15 -0.0400 0.009 -4.228 0.000 -0.059 -0.021
ceil_1.5 0.0495 0.018 2.685 0.007 0.013 0.086
ceil_2.0 0.0468 0.014 3.377 0.001 0.020 0.074
ceil_2.5 0.1334 0.059 2.247 0.025 0.017 0.250
ceil_>=3.0 0.1918 0.028 6.791 0.000 0.136 0.247
coast_1.0 0.8085 0.047 17.025 0.000 0.715 0.902
furnished_1.0 0.0617 0.015 4.009 0.000 0.032 0.092
year_sold_2015 0.1052 0.008 13.295 0.000 0.090 0.121
warm_month_sold_1.0 0.0494 0.009 5.685 0.000 0.032 0.066
zip_price_cat_medium_price -0.0067 0.019 -0.359 0.719 -0.043 0.030
zip_price_cat_high_price 0.1660 0.019 8.792 0.000 0.129 0.203
basement_category_Small Basement 0.2239 0.014 15.992 0.000 0.196 0.251
basement_category_Large Basement 0.3251 0.015 21.725 0.000 0.296 0.354
ceil_2.0_Large Basement -0.0126 0.023 -0.536 0.592 -0.059 0.033
ceil_2.0_Small Basement -0.0049 0.023 -0.213 0.831 -0.050 0.040
ceil_1.5_Large Basement -0.0357 0.034 -1.066 0.287 -0.101 0.030
ceil_1.5_Small Basement -0.0067 0.035 -0.193 0.847 -0.075 0.062
ceil_2.5_Large Basement -0.0570 0.111 -0.512 0.609 -0.275 0.161
ceil_2.5_Small Basement -0.0355 0.101 -0.353 0.724 -0.233 0.162
ceil_>=3.0_Small Basement -0.0507 0.069 -0.733 0.464 -0.186 0.085
ceil_>=3.0_Large Basement -0.2340 0.124 -1.887 0.059 -0.477 0.009
==============================================================================
Omnibus: 379.204 Durbin-Watson: 2.018
Prob(Omnibus): 0.000 Jarque-Bera (JB): 914.810
Skew: -0.023 Prob(JB): 2.25e-199
Kurtosis: 4.204 Cond. No. 5.92e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.92e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
x_train_reg2 = x_train_reg2.loc[:,x_train_reg2.columns[:-8]]
olsmodel7 = build_ols_model(x_train_reg2)
print(olsmodel7.summary())
OLS Regression Results
==============================================================================
Dep. Variable: log_price R-squared: 0.798
Model: OLS Adj. R-squared: 0.798
Method: Least Squares F-statistic: 2388.
Date: Thu, 20 Jan 2022 Prob (F-statistic): 0.00
Time: 06:39:15 Log-Likelihood: -9364.5
No. Observations: 15129 AIC: 1.878e+04
Df Residuals: 15103 BIC: 1.898e+04
Df Model: 25
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -101.5906 10.932 -9.293 0.000 -123.018 -80.163
room_bed -0.0444 0.005 -8.118 0.000 -0.055 -0.034
room_bath 0.0932 0.007 12.958 0.000 0.079 0.107
sight 0.0959 0.006 17.308 0.000 0.085 0.107
condition 0.1201 0.006 19.064 0.000 0.108 0.132
quality 0.2369 0.007 34.183 0.000 0.223 0.251
yr_built -0.0044 0.000 -22.059 0.000 -0.005 -0.004
yr_renovated 8.281e-05 9.71e-06 8.532 0.000 6.38e-05 0.000
lat 0.8426 0.009 90.758 0.000 0.824 0.861
long -0.5531 0.089 -6.205 0.000 -0.728 -0.378
log_lot_measure 0.0561 0.010 5.841 0.000 0.037 0.075
log_ceil_measure 0.3056 0.009 34.091 0.000 0.288 0.323
log_living_measure15 0.1374 0.006 22.457 0.000 0.125 0.149
log_lot_measure15 -0.0403 0.009 -4.260 0.000 -0.059 -0.022
ceil_1.5 0.0410 0.015 2.777 0.005 0.012 0.070
ceil_2.0 0.0432 0.012 3.561 0.000 0.019 0.067
ceil_2.5 0.1135 0.044 2.590 0.010 0.028 0.199
ceil_>=3.0 0.1757 0.026 6.674 0.000 0.124 0.227
coast_1.0 0.8099 0.047 17.065 0.000 0.717 0.903
furnished_1.0 0.0597 0.015 3.912 0.000 0.030 0.090
year_sold_2015 0.1054 0.008 13.326 0.000 0.090 0.121
warm_month_sold_1.0 0.0493 0.009 5.677 0.000 0.032 0.066
zip_price_cat_medium_price -0.0071 0.019 -0.383 0.702 -0.043 0.029
zip_price_cat_high_price 0.1656 0.019 8.776 0.000 0.129 0.203
basement_category_Small Basement 0.2191 0.011 19.841 0.000 0.197 0.241
basement_category_Large Basement 0.3157 0.013 25.225 0.000 0.291 0.340
==============================================================================
Omnibus: 382.108 Durbin-Watson: 2.018
Prob(Omnibus): 0.000 Jarque-Bera (JB): 924.921
Skew: -0.025 Prob(JB): 1.43e-201
Kurtosis: 4.210 Cond. No. 5.91e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.91e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
x_test_reg2 = x_test_reg2[x_train_reg2.columns]
# Finally, creating the same regression with sklearn package so that a function can easily produce metrics as the rest
# of the models use sklearn package
l_reg = LinearRegression(fit_intercept=False)
l_reg.fit(x_train_reg2,y_train_scaled)
pred_train_log = l_reg.predict(x_train_reg2)
pred_test_log = l_reg.predict(x_test_reg2)
pred_train = inv_transformation(pred_train_log, log=True, normalized = True, source=y_train_log)
pred_test = inv_transformation(pred_test_log, log=True, normalized = True, source=y_test_log)
# normalized=True,
metrics_list = get_model_score(Y_train=y_train_scaled,Y_test=y_test_scaled,p_train=pred_train_log,p_test=pred_test_log)
R-sqaure on training set : 0.7980867468957445 R-square on test set : 0.7184307912168573 RMSE on training set : 0.4493475860670173 RMSE on test set : 0.5306309534725078 MAE on training set : 0.5854956449184814 MAE on test set : 0.6473823753390626 MAPE on training set : 1.346482190655996 MAPE on test set : 1.5205486561797315 Economic Cost on training set : 0.018616526116945916 Economic Cost on test set : -0.060099859418995885
metrics_list = get_model_score(Y_train=y_train,Y_test=y_test,p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.7702341673695324 R-square on test set : 0.7251029495266087 RMSE on training set : 176139.5267446333 RMSE on test set : 192470.5256674728 MAE on training set : 318.5530935674518 MAE on test set : 340.97879587536937 MAPE on training set : 0.43056779672242873 MAPE on test set : 0.47782236834703007 Economic Cost on training set : 282935.40478617436 Economic Cost on test set : 258491.79625910538
We must check that the OLS assumptions are satisifed before walking away from the OLS regression.
# Checking that the residual mean is 0
residuals = olsmodel7.resid
print(f'Residual mean: {np.mean(residuals)}')
Residual mean: -2.7316755360314384e-14
The residual mean is approximately 0.
Test for heteroskedasticity using the Goldfeldquandt test.
H0 : Residuals are homoscedastic
HA : Residuals have hetroscedasticity
alpha = 0.05
from statsmodels.compat import lzip
#Checking for heteroskedasticity
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(residuals, x_train_reg2)
lzip(name, test)
[('F statistic', 0.977470705257121), ('p-value', 0.8387147087702829)]
We fail to reject the null hypothesis, meaning that the model does not suffer from heteroskedasticity.
Below the residuals are plotted against the fitted values and followed by a distribution plot of the residuals.
from yellowbrick.regressor import ResidualsPlot
# Instantiate the linear model and visualizer
model = l_reg
visualizer = ResidualsPlot(model)
visualizer.fit(x_train_reg2, y_train_scaled) # Fit the training data to the visualizer
visualizer.score(x_test_reg2, y_test_scaled) # Evaluate the model on the test data
visualizer.show()
<AxesSubplot:title={'center':'Residuals for LinearRegression Model'}, xlabel='Predicted Value', ylabel='Residuals'>
The residuals cluster into a mostly spherical cloud like desired. There are a few outliers on the tails that deviate the mean from 0, especially for the lower fitted values. But, I already checked the extreme values in the explantory vars and there was no reason to believe these values were outside of the natual processes for these vars. Overall, I think that this is a reasonable residual plot. The residuals are normally distributed in the distribution plot on the side.
# Plot q-q plot of residuals
import pylab
import scipy.stats as stats
stats.probplot(residuals, dist="norm", plot=pylab)
plt.show()
# Plotting observed and predicted values
fig, ax = plt.subplots(figsize=(8, 6))
y_pred = l_reg.predict(x_test_reg2)
ax.scatter(y_test_scaled, y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test_scaled.min(), y_test_scaled.max()], [y_test_scaled.min(), y_test_scaled.max()], 'k--', lw=3)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')
ax.set_title("Observed vs Predicted")
plt.grid()
plt.show()
Ridge regression is a regularization algorithm that reduces the chance of overfitting by biasing the model applying the L1 penalty to the original model. The result is a model with coefficients reduced towards zero, but not exactly zero, and that has high bias, but low variance. An important component of the L1 penalty is the scalar term alpha. What is the best value for alpha? GridSearchCV is applied to determine the best value for alpha.
# Creating the Ridge object
ridge_tuned = Ridge()
# Grid of parameters to choose from
parameters = {'alpha': np.arange(1, 10, 0.01)}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.r2_score)
# Run the grid search
grid_obj= GridSearchCV(ridge_tuned, parameters, scoring='neg_mean_squared_error', cv=5)
grid_obj = grid_obj.fit(x_train_RnL, y_train_scaled)
# Set the clf to the best combination of parameters
ridge_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
ridge_tuned.fit(x_train_RnL, y_train_scaled)
Ridge(alpha=1.1900000000000002)
pred_train_rnl = ridge_tuned.predict(x_train_RnL)
pred_test_rnl = ridge_tuned.predict(x_test_RnL)
pred_train = inv_transformation(pred_train_rnl, log=True, normalized=True,source=y_train_log)
pred_test = inv_transformation(pred_test_rnl, log=True, normalized=True,source=y_test_log)
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train_scaled,Y_test=y_test_scaled, p_train=pred_train_rnl,p_test=pred_test_rnl)
R-sqaure on training set : 0.8026420362796596 R-square on test set : 0.723945446423107 RMSE on training set : 0.4442498888242298 RMSE on test set : 0.5254089393766468 MAE on training set : 0.5819111917206554 MAE on test set : 0.6446859270696119 MAPE on training set : 1.3430670947813241 MAPE on test set : 1.5276310300895897 Economic Cost on training set : 0.01622054271272895 Economic Cost on test set : -0.059347929838770866
# Metrics on the reverse transformed data
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train,Y_test=y_test, p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.7829038860194857 R-square on test set : 0.7350030106367794 RMSE on training set : 171214.3361574935 RMSE on test set : 188972.95853196934 MAE on training set : 315.49600296835933 MAE on test set : 339.14359597941205 MAPE on training set : 0.4281743320666672 MAPE on test set : 0.47524459052149165 Economic Cost on training set : 280878.5069193036 Economic Cost on test set : 254720.82906606342
# Let us explore the coefficients for each of the independent attributes
for idx, col_name in enumerate(x_train_RnL.columns):
print("The coefficient for {} is {}".format(col_name, ridge_tuned.coef_[0][idx]))
The coefficient for room_bed is -0.04654028006352154 The coefficient for room_bath is 0.07799404660998734 The coefficient for sight is 0.09328691692111904 The coefficient for condition is 0.11940511393024167 The coefficient for quality is 0.23287930119770578 The coefficient for yr_built is -0.004490492744514443 The coefficient for yr_renovated is 7.93778685266897e-05 The coefficient for lat is 0.8398140649543245 The coefficient for long is -0.5220022672094843 The coefficient for log_living_measure is 0.1796128979930314 The coefficient for log_lot_measure is 0.0454641750872246 The coefficient for log_ceil_measure is 0.11285796790856878 The coefficient for log_living_measure15 is 0.18045087408039523 The coefficient for log_lot_measure15 is -0.03643346027619597 The coefficient for ceil_1.5 is -0.07110012628580223 The coefficient for ceil_2.0 is -0.08599864612234794 The coefficient for ceil_2.5 is -0.08196755757095804 The coefficient for ceil_3.0 is -0.17035634041966646 The coefficient for ceil_3.5 is -0.01960166232117256 The coefficient for coast_1.0 is 0.8002906465993909 The coefficient for furnished_1.0 is 0.9119268035908837 The coefficient for year_sold_2015 is 0.10301667640335042 The coefficient for warm_month_sold_1.0 is 0.04831877406362686 The coefficient for zip_price_cat_medium_price is -0.0033419666837279285 The coefficient for zip_price_cat_high_price is 0.16773848950776157 The coefficient for basement_category_Small Basement is 0.07522604211844336 The coefficient for basement_category_Large Basement is 0.06217555311038209 The coefficient for furnished_log_ceil_measure is -0.08458680552922894 The coefficient for furnished_log_living_measure is 0.5797663877691814 The coefficient for furnished_log_living_measure15 is -0.6124109822744778 The coefficient for ceil_2.0_log_ceil_measure15 is 0.017069505019807554 The coefficient for ceil_3.0_log_ceil_measure15 is 0.04928681660230201 The coefficient for ceil_1.5_log_ceil_measure15 is 0.01642969812406037 The coefficient for ceil_2.5_log_ceil_measure15 is 0.02509766772495992 The coefficient for ceil_3.5_log_ceil_measure15 is 0.06411671942107901 The coefficient for ceil_2.0_Large Basement is 0.020548970739005336 The coefficient for ceil_2.0_Small Basement is 0.05600808009145712 The coefficient for ceil_3.0_Large Basement is -0.3228630508804215 The coefficient for ceil_3.0_Small Basement is 0.004564493282245821 The coefficient for ceil_1.5_Large Basement is 0.01965503510913235 The coefficient for ceil_1.5_Small Basement is 0.03870539086432983 The coefficient for ceil_2.5_Large Basement is -0.05392839329009163 The coefficient for ceil_2.5_Small Basement is 0.04364514361594192 The coefficient for ceil_3.5_Large Basement is 0.0 The coefficient for ceil_3.5_Small Basement is 0.0
# Checking that the residual mean is 0
residuals = ridge_tuned.predict(x_train_RnL)
print(f'Residual mean: {np.mean(residuals)}')
Residual mean: -3.0841377732761096e-14
The residual mean is approximately 0.
#Checking for heteroskedasticity
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(residuals, x_train_RnL)
lzip(name, test)
[('F statistic', 1.0062476414338963), ('p-value', 0.39355441788211276)]
The test fails to reject the null hypothesis that the errors are heteroskedastic.
# Instantiate the linear model and visualizer
model = ridge_tuned
visualizer = ResidualsPlot(model)
visualizer.fit(x_train_RnL, y_train_scaled) # Fit the training data to the visualizer
visualizer.score(x_test_RnL, y_test_scaled) # Evaluate the model on the test data
visualizer.show()
<AxesSubplot:title={'center':'Residuals for Ridge Model'}, xlabel='Predicted Value', ylabel='Residuals'>
Residual cloud like desired with a a normally distributed residual term around 0 in the side plot.
stats.probplot(residuals.flatten(), dist="norm", plot=pylab)
plt.show()
# Plotting observed and predicted values
fig, ax = plt.subplots(figsize=(8, 6))
y_pred = ridge_tuned.predict(x_test_RnL)
ax.scatter(y_test_scaled, y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test_scaled.min(), y_test_scaled.max()], [y_test_scaled.min(), y_test_scaled.max()], 'k--', lw=3)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')
ax.set_title("Observed vs Predicted")
plt.grid()
plt.show()
# Creating the Ridge object
lasso_tuned = Lasso()
# Grid of parameters to choose from
parameters = {'alpha': np.arange(0.1, 10, 0.01)}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.r2_score)
# Run the grid search
grid_obj= GridSearchCV(lasso_tuned, parameters, scoring='neg_mean_squared_error', cv=5)
grid_obj = grid_obj.fit(x_train_RnL, y_train_scaled)
# Set the clf to the best combination of parameters
lasso_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
lasso_tuned.fit(x_train_RnL, y_train_scaled)
Lasso(alpha=0.1)
pred_train_rnl = lasso_tuned.predict(x_train_RnL)
pred_test_rnl = lasso_tuned.predict(x_test_RnL)
pred_train = inv_transformation(pred_train_rnl, log=True, normalized=True,source=y_train_log)
pred_test = inv_transformation(pred_test_rnl, log=True, normalized=True,source=y_test_log)
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train_scaled,Y_test=y_test_scaled, p_train=pred_train_rnl,p_test=pred_test_rnl)
R-sqaure on training set : 0.7021441058734572 R-square on test set : 0.624738855049096 RMSE on training set : 0.5457617558299069 RMSE on test set : 0.6125856225466804 MAE on training set : 0.6500322131823726 MAE on test set : 0.6975051302350815 MAPE on training set : 1.3720957811697518 MAPE on test set : 1.5614076043519087 Economic Cost on training set : -0.06818053575636995 Economic Cost on test set : -0.02449977307447958
# Metrics for the reverse transformed data
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train,Y_test=y_test, p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.6182265184707811 R-square on test set : 0.5562618664654669 RMSE on training set : 227047.63405781268 RMSE on test set : 244535.867454023 MAE on training set : 352.3174769491871 MAE on test set : 371.8742346095329 MAPE on training set : 0.47802814998841303 MAPE on test set : 0.507029834406714 Economic Cost on training set : 278756.6612973162 Economic Cost on test set : 261760.95156887238
# Let us explore the coefficients for each of the independent attributes
for idx, col_name in enumerate(x_train_RnL.columns):
print("The coefficient for {} is {}".format(col_name, lasso_tuned.coef_[idx]))
The coefficient for room_bed is 0.0 The coefficient for room_bath is 0.0 The coefficient for sight is 0.00038138603662250353 The coefficient for condition is 0.0 The coefficient for quality is 0.27323038629929614 The coefficient for yr_built is -0.0060939421949668645 The coefficient for yr_renovated is 0.00011344691277117903 The coefficient for lat is 0.3816682454324212 The coefficient for long is -0.0 The coefficient for log_living_measure is 0.25591548876309295 The coefficient for log_lot_measure is -0.0 The coefficient for log_ceil_measure is 0.0 The coefficient for log_living_measure15 is 0.057672265575375956 The coefficient for log_lot_measure15 is -0.0 The coefficient for ceil_1.5 is 0.0 The coefficient for ceil_2.0 is 0.0 The coefficient for ceil_2.5 is 0.0 The coefficient for ceil_3.0 is 0.0 The coefficient for ceil_3.5 is 0.0 The coefficient for coast_1.0 is 0.0 The coefficient for furnished_1.0 is 0.0 The coefficient for year_sold_2015 is 0.0 The coefficient for warm_month_sold_1.0 is 0.0 The coefficient for zip_price_cat_medium_price is -0.0 The coefficient for zip_price_cat_high_price is 0.0 The coefficient for basement_category_Small Basement is 0.0 The coefficient for basement_category_Large Basement is 0.0 The coefficient for furnished_log_ceil_measure is 0.0 The coefficient for furnished_log_living_measure is 0.04291943536276795 The coefficient for furnished_log_living_measure15 is 0.0 The coefficient for ceil_2.0_log_ceil_measure15 is 0.0029131554618214 The coefficient for ceil_3.0_log_ceil_measure15 is 0.0 The coefficient for ceil_1.5_log_ceil_measure15 is 0.0 The coefficient for ceil_2.5_log_ceil_measure15 is 0.0 The coefficient for ceil_3.5_log_ceil_measure15 is 0.0 The coefficient for ceil_2.0_Large Basement is 0.0 The coefficient for ceil_2.0_Small Basement is 0.0 The coefficient for ceil_3.0_Large Basement is 0.0 The coefficient for ceil_3.0_Small Basement is 0.0 The coefficient for ceil_1.5_Large Basement is 0.0 The coefficient for ceil_1.5_Small Basement is 0.0 The coefficient for ceil_2.5_Large Basement is 0.0 The coefficient for ceil_2.5_Small Basement is 0.0 The coefficient for ceil_3.5_Large Basement is 0.0 The coefficient for ceil_3.5_Small Basement is 0.0
# Checking the mean
prediction = lasso_tuned.predict(x_train_RnL)
residuals = (y_train_scaled.log_price - prediction)
print(f'Residual mean: {np.mean(residuals)}')
Residual mean: 3.234756489224463e-17
The residual mean is approximately 0.
#Checking for heteroskedasticity
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(residuals, x_train_RnL)
lzip(name, test)
[('F statistic', 0.969129012778069), ('p-value', 0.9130362812495011)]
The test fails to reject the null hypothesis that the errors are heteroskedastic.
# Residual plots
model = lasso_tuned
visualizer = ResidualsPlot(model)
visualizer.fit(x_train_RnL, y_train_scaled.log_price) # Fit the training data to the visualizer
visualizer.score(x_test_RnL, y_test_scaled.log_price) # Evaluate the model on the test data
visualizer.show()
<AxesSubplot:title={'center':'Residuals for Lasso Model'}, xlabel='Predicted Value', ylabel='Residuals'>
Residual cloud like desired in the main plot, althought it could be more circular. On the side plot, we see a approximately normally distributed residual term around 0.
stats.probplot(residuals, dist="norm", plot=pylab)
plt.show()
# Plotting observed and predicted values
fig, ax = plt.subplots(figsize=(8, 6))
y_pred = lasso_tuned.predict(x_test_RnL)
ax.scatter(y_test_scaled, y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test_scaled.min(), y_test_scaled.max()], [y_test_scaled.min(), y_test_scaled.max()], 'k--', lw=3)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')
ax.set_title("Observed vs Predicted")
plt.grid()
plt.show()
# Choose the type of classifier.
rf_tuned = RandomForestRegressor(random_state=5)
# Grid of parameters to choose from
parameters = {
'max_depth':[4, 6, 8, 10, None],
'max_features': ['sqrt','log2',None],
'n_estimators': [80, 90, 100, 110, 120]
}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.r2_score)
# Run the grid search
grid_obj = GridSearchCV(rf_tuned, parameters, scoring=scorer,cv=5)
grid_obj = grid_obj.fit(x_train_xg, y_train_xg)
# Set the clf to the best combination of parameters
rf_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
rf_tuned.fit(x_train_xg, y_train_xg)
RandomForestRegressor(max_features=None, n_estimators=110, random_state=5)
pred_train_rf = rf_tuned.predict(x_train_xg)
pred_test_rf = rf_tuned.predict(x_test_xg)
pred_train = inv_transformation(pred_train_rf, normalized=True,source=y_train)
pred_test = inv_transformation(pred_test_rnl, normalized=True,source=y_test)
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train_xg,Y_test=y_test_xg, p_train=pred_train_rf,p_test=pred_test_rf)
R-sqaure on training set : 0.9713740650735649 R-square on test set : 0.8166702927854737 RMSE on training set : 0.16919200609495416 RMSE on test set : 0.42817018487340563 MAE on training set : 0.3056271897310597 MAE on test set : 0.49382532949768837 MAPE on training set : 0.7878210735057669 MAPE on test set : 1.4897609829997753 Economic Cost on training set : -0.06712243643452027 Economic Cost on test set : 0.016712720754685433
# Metrics for the reverse transformed data
## Function to calculate r2_score and RMSE on train and test data
metrics_list =get_model_score(Y_train=y_train,Y_test=y_test, p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.9713770047634076 R-square on test set : 0.5617926273613455 RMSE on training set : 62168.67919019812 RMSE on test set : 243007.1390417179 MAE on training set : 185.263057140273 MAE on test set : 375.06326957937955 MAPE on training set : 0.2538684704374979 MAPE on test set : 0.5258444751442147 Economic Cost on training set : 265497.553027732 Economic Cost on test set : 303250.18670790346
# Choose the type of classifier.
ad_tuned = AdaBoostRegressor(random_state=5)
# Grid of parameters to choose from
parameters = {'n_estimators': np.arange(10,100,10),
'learning_rate': np.arange(0.01,1,0.1),
}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.r2_score)
# Run the grid search
grid_obj = GridSearchCV(ad_tuned, parameters, scoring=scorer,cv=5)
grid_obj = grid_obj.fit(x_train_xg, y_train_xg)
# Set the clf to the best combination of parameters
ad_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
ad_tuned.fit(x_train_xg, y_train_xg)
AdaBoostRegressor(learning_rate=0.31000000000000005, n_estimators=20,
random_state=5)
## Function to calculate r2_score and RMSE on train and test data
pred_train_ada = ad_tuned.predict(x_train_xg)
pred_test_ada = ad_tuned.predict(x_test_xg)
pred_train = inv_transformation(pred_train_ada, normalized=True,source=y_train)
pred_test = inv_transformation(pred_test_ada, normalized=True,source=y_test)
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train_xg,Y_test=y_test_xg, p_train=pred_train_ada,p_test=pred_test_ada)
R-sqaure on training set : 0.6631270298660847 R-square on test set : 0.6409074990869514 RMSE on training set : 0.580407589659125 RMSE on test set : 0.5992432735651261 MAE on training set : 0.6397872922060861 MAE on test set : 0.639628462319425 MAPE on training set : 1.5205417964334254 MAPE on test set : 1.7484623874057734 Economic Cost on training set : -0.06107005931732684 Economic Cost on test set : -0.056621508191252244
# Metrics for the reverse transformed data
## Function to calculate r2_score and RMSE on train and test data
metrics_list =get_model_score(Y_train=y_train,Y_test=y_test, p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.6631267934260567 R-square on test set : 0.64090496639373 RMSE on training set : 213278.6272175183 RMSE on test set : 219980.35155938612 MAE on training set : 387.8308875367581 MAE on test set : 387.54044165290907 MAPE on training set : 0.5821657392348406 MAPE on test set : 0.5798965342789075 Economic Cost on training set : 344407.69711786916 Economic Cost on test set : 344159.741295529
# Choose the type of classifier.
gb_tuned = GradientBoostingRegressor(random_state=1)
# Grid of parameters to choose from
parameters = {'n_estimators': np.arange(50,200,25),
'subsample':[0.5,0.7,0.8,0.9,1],
'max_features':[0.5,0.7,0.8,0.9,1],
'max_depth':[3,5,7,10]
}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.r2_score)
# Run the grid search
grid_obj = GridSearchCV(gb_tuned, parameters, scoring=scorer,cv=5)
grid_obj = grid_obj.fit(x_train_xg, y_train_xg)
# Set the clf to the best combination of parameters
gb_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
gb_tuned.fit(x_train_xg, y_train_xg)
GradientBoostingRegressor(max_depth=7, max_features=0.5, n_estimators=150,
random_state=1, subsample=0.9)
## Function to calculate r2_score and RMSE on train and test data
pred_train_gb = gb_tuned.predict(x_train_xg)
pred_test_gb = gb_tuned.predict(x_test_xg)
pred_train = inv_transformation(pred_train_gb, normalized=True,source=y_train)
pred_test = inv_transformation(pred_test_gb, normalized=True,source=y_test)
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train_xg,Y_test=y_test_xg, p_train=pred_train_gb,p_test=pred_test_gb)
R-sqaure on training set : 0.952442000657079 R-square on test set : 0.816138547399174 RMSE on training set : 0.21807796620227607 RMSE on test set : 0.4287906862337683 MAE on training set : 0.3975371269951962 MAE on test set : 0.49883696106455927 MAPE on training set : 1.1342044684408048 MAPE on test set : 1.438907279408837 Economic Cost on training set : -0.033287625633552045 Economic Cost on test set : 0.030409926979106855
# Metrics for the reverse transformed data
## Function to calculate r2_score and RMSE on train and test data
metrics_list =get_model_score(Y_train=y_train,Y_test=y_test, p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.952443437516542 R-square on test set : 0.8161381317482216 RMSE on training set : 80134.46264365243 RMSE on test set : 157407.35698563536 MAE on training set : 240.98040075759545 MAE on test set : 302.2395617783653 MAPE on training set : 0.35648789605100234 MAPE on test set : 0.41564814303152403 Economic Cost on training set : 269182.0789240855 Economic Cost on test set : 281746.3383317688
# Plotting observed and predicted values
fig, ax = plt.subplots(figsize=(8, 6))
y_pred = gb_tuned.predict(x_test_xg)
ax.scatter(y_test_scaled, y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test_scaled.min(), y_test_scaled.max()], [y_test_scaled.min(), y_test_scaled.max()], 'k--', lw=3)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')
ax.set_title("Observed vs Predicted")
plt.grid()
plt.show()
# Choose the type of classifier.
xgb_tuned = XGBRegressor(random_state=1)
# Grid of parameters to choose from
parameters = {'n_estimators': np.arange(50,200,25),
'subsample':[0.5,0.7, 0.8, 0.9, 1],
'gamma':[0, 1, 3, 5],
'colsample_bytree':[0.5,0.7, 0.8, 0.9, 1],
'colsample_bylevel':[0.5,0.7, 0.8, 0.9, 1]
}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.r2_score)
# Run the grid search
grid_obj = GridSearchCV(xgb_tuned, parameters, scoring=scorer,cv=5)
grid_obj = grid_obj.fit(x_train_xg, y_train_xg)
# Set the clf to the best combination of parameters
xgb_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
xgb_tuned.fit(x_train_xg, y_train_xg)
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=0.7,
colsample_bynode=1, colsample_bytree=0.7, enable_categorical=False,
gamma=1, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=125, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=1, reg_alpha=0,
reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
## Function to calculate r2_score and RMSE on train and test data
pred_train_xg = xgb_tuned.predict(x_train_xg)
pred_test_xg = xgb_tuned.predict(x_test_xg)
pred_train = inv_transformation(pred_train_xg, normalized=True,source=y_train)
pred_test = inv_transformation(pred_test_xg, normalized=True,source=y_test)
## Function to calculate r2_score and RMSE on train and test data
metrics_list = get_model_score(Y_train=y_train_xg,Y_test=y_test_xg, p_train=pred_train_xg,p_test=pred_test_xg)
R-sqaure on training set : 0.9218389746004473 R-square on test set : 0.8071367445109984 RMSE on training set : 0.27957293395383015 RMSE on test set : 0.43916199230921793 MAE on training set : 0.44558687387504226 MAE on test set : 0.5069836394288172 MAPE on training set : 1.255225318686094 MAPE on test set : 1.4304257414080686 Economic Cost on training set : -0.005338794329396575 Economic Cost on test set : 0.0423194473018027
# Metrics for the reverse transformed data
## Function to calculate r2_score and RMSE on train and test data
metrics_list =get_model_score(Y_train=y_train,Y_test=y_test, p_train=pred_train,p_test=pred_test)
store_metrics(metrics_list)
R-sqaure on training set : 0.9218401659660709 R-square on test set : 0.8071349739820859 RMSE on training set : 102732.0408386038 RMSE on test set : 161215.1799968381 MAE on training set : 270.1089991194111 MAE on test set : 307.1760630127583 MAPE on training set : 0.3935431165293283 MAPE on test set : 0.4225093651363166 Economic Cost on training set : 278946.31908673205 Economic Cost on test set : 279705.1183318584
# Plotting observed and predicted values
fig, ax = plt.subplots(figsize=(8, 6))
y_pred = xgb_tuned.predict(x_test_xg)
ax.scatter(y_test_scaled, y_pred, edgecolors=(0, 0, 1))
ax.plot([y_test_scaled.min(), y_test_scaled.max()], [y_test_scaled.min(), y_test_scaled.max()], 'k--', lw=3)
ax.set_xlabel('Observed')
ax.set_ylabel('Predicted')
ax.set_title("Observed vs Predicted")
plt.grid()
plt.show()
comparison_frame = pd.DataFrame({'Model':['Naive Mean','OLS Regression','Tuned Ridge Regression','Tuned Lasso Regression',
'Tuned Random Forest','AdaBoost','Tuned_GradientBoost',
'Tuned XGBoost Regressor'],
'Train_r2': r2_train,'Test_r2': r2_test,
'Train_RMSE':rmse_train,'Test_RMSE':rmse_test,
'Train_MAE':mae_train,'Test_MAE':mae_test,
'Train_MAPE':mape_train,'Test_MAPE':mape_test,
'Train_Econ_Cost':econ_train,'Test_Econ_Cost':econ_test})
comparison_frame
| Model | Train_r2 | Test_r2 | Train_RMSE | Test_RMSE | Train_MAE | Test_MAE | Train_MAPE | Test_MAPE | Train_Econ_Cost | Test_Econ_Cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Naive Mean | -1.225 | -1.233 | 548140.141 | 548531.075 | 735.273 | 735.544 | 1.139 | 1.140 | 540625.886 | 541024.386 |
| 1 | OLS Regression | 0.770 | 0.725 | 176139.527 | 192470.526 | 318.553 | 340.979 | 0.431 | 0.478 | 282935.405 | 258491.796 |
| 2 | Tuned Ridge Regression | 0.783 | 0.735 | 171214.336 | 188972.959 | 315.496 | 339.144 | 0.428 | 0.475 | 280878.507 | 254720.829 |
| 3 | Tuned Lasso Regression | 0.618 | 0.556 | 227047.634 | 244535.867 | 352.317 | 371.874 | 0.478 | 0.507 | 278756.661 | 261760.952 |
| 4 | Tuned Random Forest | 0.971 | 0.562 | 62168.679 | 243007.139 | 185.263 | 375.063 | 0.254 | 0.526 | 265497.553 | 303250.187 |
| 5 | AdaBoost | 0.663 | 0.641 | 213278.627 | 219980.352 | 387.831 | 387.540 | 0.582 | 0.580 | 344407.697 | 344159.741 |
| 6 | Tuned_GradientBoost | 0.952 | 0.816 | 80134.463 | 157407.357 | 240.980 | 302.240 | 0.356 | 0.416 | 269182.079 | 281746.338 |
| 7 | Tuned XGBoost Regressor | 0.922 | 0.807 | 102732.041 | 161215.180 | 270.109 | 307.176 | 0.394 | 0.423 | 278946.319 | 279705.118 |
feature_names = x_train_xg.columns
importances = xgb_tuned.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
I also tried using K-Means clustering to find groups based on location in place of zipcodes. This model performed slightly worse. Another possibility is to get GPS coordinates for local schools, hospitals, fire departments, etc. and calculate each house distance to the nearest location of each type of institution. If that fails, I may have to use a seperate model to estimate the houses with the greatest residuals.
The XGBoost model is the best model for predicting prices in King County, Washington with a test r-square of 0.82 and RMSE of \$161,215. Houses with higher prices were associated with being furnished, in the northern part of the county, larger living areas, higher quality scores, a coastal view, more views, and outside of areas with medium price averages. These features can help sellers act to increase their house prices and buyers know what to consider when looking for lower prices.